Why Does TCP Keep Connections in TIME_WAIT? Uncovering the Hidden Bottleneck
This article explains the purpose of the TCP TIME_WAIT state, how it prevents packet loss and connection errors, examines its duration based on the Maximum Segment Lifetime, and analyzes why high‑QPS load testing tools like ab may appear to bypass TIME_WAIT, revealing the impact on server resources.
TCP State Transitions
TCP’s three‑way handshake and four‑way termination involve many states to handle unreliable networks. The diagram below illustrates the full state machine, divided into connection establishment (upper part), active close (lower‑left), and passive close (lower‑right).
Linux provides netstat and the more efficient ss (Socket Statistics) to list current socket states, allowing quick inspection of a server’s network condition.
TIME_WAIT
Definition
When a TCP connection is actively closed, it enters the TIME_WAIT state. After the four‑way handshake finishes, both sides stop exchanging data, but the side that sent the final FIN keeps the connection in TIME_WAIT for a period.
Reason
TIME_WAIT protects against two problems:
Lost ACKs during the four‑way close could cause the remote side to resend FIN; without TIME_WAIT the local host would reply with RST, breaking the protocol.
Delayed packets belonging to the old connection might arrive after the four‑tuple (src/dst IP/port) is reused. TIME_WAIT ensures those stray packets expire before the tuple is recycled.
Duration Determination
The duration is tied to the Maximum Segment Lifetime (MSL). The active closer must keep the connection for 2 × MSL so that any stray packets have time to disappear. While RFC defines MSL as 2 minutes, many Linux distributions set it to 30 seconds, configurable via /proc/sys/net/ipv4/tcp_fin_timeout.
ab’s “Strange” Behavior
Hypothesis
Each closed connection should remain in TIME_WAIT for about 60 seconds, freezing its four‑tuple. With roughly 30 000 local ports, a client can sustain about 500 QPS before running out of ports.
However, when using ab to generate 4 000 QPS, no TIME_WAIT entries appeared on the client side.
Analysis
Investigation showed that the first FIN packet is sent by the server, not the client, meaning ab does not actively close the connections. Capturing traffic confirmed the server initiates the close, and the server accumulated many TIME_WAIT sockets, though port reuse mitigated the impact.
Conclusion
While TIME_WAIT is essential for reliable TCP termination, an excess of TIME_WAIT sockets under high concurrency can consume system resources and become a performance bottleneck.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
