Why Are CLOSE_WAIT Sockets Sticking? Uncovering HttpClient’s Hidden Connection Leak
This article investigates persistent CLOSE_WAIT sockets in a Tomcat‑Nginx architecture, identifies HttpClient’s connection‑manager as the root cause, and details the step‑by‑step analysis and configuration changes that finally eliminated the issue.
1. Overview
Internal architecture: Tomcat application → Nginx → other Tomcat applications. Internal Tomcat services call other services through Nginx.
HTTP library: HttpClient 4.2.3
Code that closes connections:
httpClient.getConnectionManager().closeIdleConnections(5, TimeUnit.SECONDS);2. Explanation
The CLOSE_WAIT state is caused by the HttpClient connection manager, not by the server, Nginx, or other configurations.
3. Investigation Approach
The problem persisted despite trying many online solutions, kernel tweaks, Nginx configuration changes, and version upgrades, all of which proved ineffective because the issue is unrelated to the server or Nginx.
Finally, a detailed request analysis was performed.
4. Problem Investigation
First, identify the IP and port of the connections in CLOSE_WAIT state.
The screenshot shows that connections to Nginx on port 81 cause CLOSE_WAIT.
Packet capture of a CLOSE_WAIT connection reveals:
Capture of a normally closed request:
Analysis shows that at 22:01:02 a request is made, data transmission ends at the same second, and at 22:02:07 Nginx sends a FIN packet. Tomcat acknowledges the close, but does not send its own FIN, so the four‑way handshake is incomplete, resulting in CLOSE_WAIT.
All normal connections where Tomcat initiates the close do not produce CLOSE_WAIT; those where Nginx initiates the close do.
Further packet analysis shows Tomcat periodically sends close requests, but Nginx replies with a RST because it has already closed the connection, causing CLOSE_WAIT to persist for up to two hours before the system forces a close.
The code responsible for closing idle connections is:
httpClient.getConnectionManager().closeIdleConnections(5, TimeUnit.SECONDS);This instructs HttpClient to close connections idle for five seconds, but the observed behavior contradicts this setting.
Documentation indicates that connections unused for the specified time are closed in the pool, yet many CLOSE_WAIT connections remain idle for over 65 seconds (the Nginx keepalive timeout).
Adjusting Nginx keepalive_timeout to 240 s, then 360 s, and finally 0 s showed that only when the timeout is zero does the number of CLOSE_WAIT connections decrease, confirming that HttpClient’s idle‑connection detection is at fault.
5. Further Analysis
Reviewing the HttpClient documentation ( http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/conn/ClientConnectionManager.html ) confirms that idle connections are closed after the configured period.
To resolve the issue, two server‑side kernel parameters were set to zero:
sysctl -w net.ipv4.tcp_tw_recycle=0
sysctl -w net.ipv4.tcp_timestamps=0Additionally, the code was changed to explicitly release connections and shut down the client instead of relying on the idle‑connection timer. After deploying the modified code to the test environment, no new CLOSE_WAIT sockets appeared.
6. Summary
The CLOSE_WAIT state was caused by HttpClient’s connection‑manager pool. Although the code specified a five‑second idle timeout, the pool did not enforce it, leading to lingering connections. After adjusting kernel parameters and modifying the shutdown logic, the issue disappeared and remained resolved for months.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.