Why Did Our HttpClient Crash the Server? Uncovering evictExpiredConnections and OOM
A detailed post explains how a mis‑configured HttpClient caused thread explosion and OOM on four servers, walks through the investigation using APM metrics, clarifies keep‑alive mechanics, and presents the fix of using a singleton HttpClient with proper connection eviction.
Problem Investigation
After receiving a flood of APM alerts, the operations team reported that all four production machines were OOM and the service was unavailable. The first step was to restart the machines and confirm the OOM cause from the logs.
Because the machines were restarted, a memory dump could not be taken, so the author examined JVM metrics in the APM dashboard. The thread count had risen continuously from 16:00, reaching about 30k, far above the normal ~600 threads.
The spike corresponded to a code change that added the evictExpiredConnections configuration to HttpClient initialization.
Reconstructing the Incident
Recent NoHttpResponseException errors prompted the addition of the evictExpiredConnections setting to address those exceptions. To understand the root cause, the article reviews the HTTP keep‑alive mechanism and TCP connection lifecycle.
In a typical TCP connection, a three‑way handshake establishes the connection and a four‑way handshake closes it. Without keep‑alive, each HTTP request creates and tears down a TCP connection, which is costly when the request volume is high.
Keep‑alive reuses the same TCP connection for multiple HTTP requests, eliminating most handshake overhead and improving performance. However, idle persistent connections still consume resources, so a timeout is usually set to release them after inactivity.
If the server closes an idle connection (sending FIN) and the client later reuses that connection before the FIN reaches the client, the server will respond with RST, causing NoHttpResponseException.
Solution
The evictExpiredConnections option implements the second strategy: a background thread periodically clears idle connections. However, the application created a new HttpClient for every request, spawning a thread per request and exhausting memory.
Makes this instance of HttpClient proactively evict idle connections from the connection pool using a background thread.The fix was to make HttpClient a singleton, ensuring only one cleanup thread runs. Additionally, a monitoring rule was added to alert when thread count exceeds a threshold, allowing proactive action before OOM.
Conclusion
The incident demonstrates the importance of understanding third‑party libraries, proper connection management, and comprehensive monitoring. A solid grasp of networking fundamentals, such as keep‑alive and TCP lifecycle, is essential for effective troubleshooting and performance tuning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
