-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Describe the bug
When uploading several documents at once (e.g., 3+), the Celery worker intermittently crashes or logs repeated asyncio errors. The error originates from LiteLLM’s async logging worker: an asyncio.Queue created on one event loop is later awaited from a different loop. This floods the logs with:
RuntimeError: <Queue at 0x... maxsize=50000> is bound to a different event loop
and can coincide with worker instability under load.
To Reproduce
Steps to reproduce the behavior:
Start the Aperag stack (API, celeryworker, redis, postgres, qdrant, es) with the default docker-compose.
Ensure Celery worker is running with a multi-concurrency setup (e.g., --pool=threads --concurrency=16) and LiteLLM logging enabled (default).
Upload 3 or more PDFs simultaneously so that VECTOR / FULLTEXT / GRAPH / SUMMARY indexing tasks are triggered in parallel.
See error in the celeryworker logs and observe degraded stability or worker restarts.
Screenshots & Logs
If applicable, add screenshots to help explain your problem.
[2025-10-28 06:00:03,463: INFO/MainProcess] aperag.tasks.document - Parsing document doc2068cf764beaae64 [2025-10-28 06:00:03,650: ERROR/MainProcess] asyncio - Task exception was never retrieved future: <Task finished name='Task-16375' coro=<LoggingWorker._worker_loop() done, defined at /opt/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/logging_worker.py:43> exception=RuntimeError('<Queue at 0x72ef9d72e990 maxsize=50000> is bound to a different event loop')> Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/logging_worker.py", line 51, in _worker_loop coroutine = await self._queue.get() File "/usr/local/lib/python3.11/asyncio/queues.py", line 155, in get getter = self._get_loop().create_future() File "/usr/local/lib/python3.11/asyncio/mixins.py", line 20, in _get_loop raise RuntimeError(f'{self!r} is bound to a different event loop') RuntimeError: <Queue at 0x72ef9d72e990 maxsize=50000> is bound to a different event loop