How a Misconfigured HttpClient Connection Pool Triggered a System Avalanche
An engineer recounts how a high‑traffic promotion system suffered port exhaustion and thread‑pool overload due to a default max‑connections‑per‑host setting of two, leading to CPU spikes, process crashes, and a cascading failure, and outlines the investigation steps and preventive measures.
Event Background
I built and operated a high‑traffic promotion live‑stream system that calls a real‑time service via HttpClient . Frequently the process threw Address already in use (Bind failed) because a large number of TIME_WAIT sockets occupied the ports, peaking at over 60,000, causing new connections to fail.
Problem Process
To reduce the number of ports, I introduced a connection pool. Estimating the pool size from a peak of 12,000 PV per minute and a 1.3 s response time gave a QPS of about 260. Observing logs showed each connection took ~1.1 s, so I added a 70 % safety margin and set the maximum connections to roughly 500.
public void init() {
MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams managerParams = new HttpConnectionManagerParams();
managerParams.setMaxTotalConnections(500); // max connections
connectionManager.setParams(managerParams);
HttpClient client = new HttpClient(connectionManager);
}After offline multithreaded tests confirmed higher concurrency, I rolled out the change first in a low‑traffic Nanjing data center, then fully switched to the Beijing data center. The full rollout caused unexpected system exceptions.
Case Review
Following the full traffic shift, users reported that the live page could not be opened. Monitoring showed normal business traffic but a spike in network card usage, increased response times, and several instances becoming unresponsive.
Deep Investigation
CPU usage of the Java process rose to nearly ten times the normal level, and thread counts surged past the container limit of 2,000, causing the virtualization platform to kill the instances.
TCP connection snapshots before and after rolling back the change showed a dramatic reduction in concurrent connections after rollback.
JStack logs revealed many threads waiting for a connection from the pool, causing thread queuing, higher latency, and a vicious cycle that eventually exceeded the thread limit.
Investigation of the source code showed that MultiThreadedHttpConnectionManager also checks maxHostConnections. The default DefaultMaxConnectionsPerHost is 2, which limited each host to only two concurrent connections.
Case Summary
Connection‑pool parameters were mis‑configured, leaving the maximum connections per host at 2.
Numerous request threads queued for a connection, causing thread buildup and increased response times.
Thread explosion raised CPU and memory usage, further degrading performance and leading to a cascade of failures.
Instances hitting the thread limit were killed by the virtualization platform.
Failed instances caused traffic to shift to surviving nodes, amplifying the avalanche effect.
Optimization Recommendations
Thoroughly read official documentation before upgrading or changing technology.
Reference high‑quality open‑source projects to see correct usage patterns.
Conduct offline load testing with controlled variables to expose issues early.
Stress Test Plan
Compare QPS and thread usage with and without a connection pool.
Evaluate impact of setting DefaultMaxConnectionsPerHost versus leaving it at the default.
Test different thresholds for setMaxTotalConnections and setDefaultMaxConnectionsPerHost.
Monitor thread count, CPU utilization, TCP connections, port usage, and memory during tests.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
