Stream Interruption on Ant Media Server 2.14 due to sudden memory spike on Vultr Kubernetes Engine #7465
Unanswered
oleul-bjit
asked this question in
Q&A
Replies: 2 comments 2 replies
-
|
Hi @oleul-bjit, |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Hello @burak-58, |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
All RTMP streams (~268 cameras) stopped simultaneously on Ant Media Server 2.14 running in Kubernetes. Streams threw memory spiked, heap dump API failed, pod OOMKilled, and all streams resumed only after restart.
Environment
Steps to reproduce
Expected behavior
RTMP streams should continue generate .ts files
Memory usage should remain stable under expected load.
Heap dump API should work when triggered under memory pressure to allow analysis.
Actual behavior
On 2025-08-18 between 22:11 JST and 22:21 JST, all RTMP streams stopped simultaneously.
Timeline (JST):
22:10:59 → Last successful stream recording observed.
22:11:06 → Stream stop notifications received.
22:11:08 → Multiple MongoInterruptedException from RTMPConnectionExecutor.
22:11:10 → Errors logged: Error executing call: Service: null Method: publish Num Params: 2 0: {streamId}?subscriberId=ru9JD0Dg&subscriberCode=758169 1: live exception message.
22:13:41 → Memory usage spiked to ~50%; heap dump API triggered but did not work.
22:16:01 → Continuous MongoInterruptedException and publish error logs until this time.
22:20:33 → Pod terminated due to OOMKilled.
22:21:21 → Pod restarted; all streams resumed automatically.
Why did all streams stop simultaneously and begin sending exception messages?
What does the following mean?
Error executing call: Service: null Method: publish Num Params: 2
0: {streamId}?subscriberId=ru9JD0Dg&subscriberCode=758169
1: live exception message
When and why does this exception occur?
Why did memory spike sharply during this exception period?
Why did GC + heap dump API fail under memory pressure? Is there any alternative safe way to collect a reliable heap dump in such cases?
Logs
Logs and heap dump are available here:
https://drive.google.com/drive/folders/1HOIOEYU0-yV9JjufHdmZq1TYwYdddN0x?usp=sharing
Beta Was this translation helpful? Give feedback.
All reactions