Fixing RabbitMQ Connection Leaks with Thread‑Pool Asynchrony
Facing frequent timeouts in the high‑traffic eLong red‑envelope API, we traced the root cause to a RabbitMQ SDK connection‑leak using Linux netstat, then resolved it by offloading the processing to a single‑threaded pool, eliminating blocking and avoiding the SDK bug.
1. System Architecture & Incident
In the eLong travel app, users trigger a red‑envelope claim API after login. The service checks eligibility, publishes a message to RabbitMQ, and returns a response to the front‑end. A consumer service then asynchronously consumes the MQ message and credits the virtual amount to the user’s balance.
Initially the flow seemed simple, but the API began timing out intermittently, causing the red‑envelope activity to stall until a manual restart.
2. Diagnosing with Linux netstat
Log analysis showed the failure point was the step that sent messages to RabbitMQ. To investigate, we used netstat to inspect open ports and connection states. Common commands included: netstat -tlnp – list all listening TCP ports netstat -ulnp – list all listening UDP ports netstat -an – display all sockets
Filtering by the RabbitMQ process (using grep on the PID) revealed an unusually high number of active connections, approaching the OS file‑handle limit.
3. RabbitMQ SDK Connection‑Leak Issue
The custom RabbitMQ utility class reuses a cached longConnection. If the cached connection is valid, it is reused; otherwise a new connection is created. Under high concurrency this design caused two problems:
The validity check of longConnection is not thread‑safe, leading to stale or expired connections being used.
Multiple threads can simultaneously create new connections, resulting in duplicate, unreferenced connections that never get closed.
Consequently, the service exhausted the OS’s maximum file‑handle count, and subsequent message sends failed with timeout exceptions.
4. Thread‑Pool Asynchronous Solution
To avoid waiting for the RabbitMQ SDK and to bypass the connection‑leak bug, we introduced a single‑threaded executor. The API handler immediately submits the red‑envelope processing task to the thread pool and returns a quick success response to the front‑end.
The processing thread performs the following steps:
Acquire a RabbitMQ connection (now isolated to a single thread, eliminating concurrent creation).
Publish the message to the queue.
Update the user’s virtual balance after the consumer processes the message.
Benefits of this approach:
Reduces API blocking time, allowing Tomcat threads to serve other requests efficiently.
By confining RabbitMQ interactions to one thread, the connection‑leak bug is effectively sidestepped.
5. Summary & Recommendations
The hidden RabbitMQ SDK connection‑leak was identified via netstat and caused intermittent timeouts in the red‑envelope service. Two long‑term fixes are recommended:
Introduce proper locking when creating or validating the shared connection.
Replace the custom wrapper with a robust connection‑pool library such as commons‑pool.
For an immediate fix, we adopted the asynchronous thread‑pool pattern, which restored service stability without waiting for SDK refactoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
