Mastering TCP: Handshake, Flow Control, Congestion Control and More
This comprehensive guide explains TCP fundamentals—including the differences between TCP and UDP, the three‑way handshake, four‑way termination, half‑open queues, SYN‑Flood attacks, header fields, Fast Open, timestamps, retransmission timeout calculation, flow and congestion control, Nagle's algorithm, delayed ACKs, and keep‑alive mechanisms—providing clear examples and diagrams for each concept.
TCP vs. UDP
TCP is a connection‑oriented, reliable, byte‑stream transport protocol. UDP is connection‑less and provides no reliability guarantees.
Three core differences:
Connection‑oriented : TCP requires a three‑way handshake before data exchange.
Reliability : TCP maintains state, tracks sent and acknowledged data, and retransmits lost packets.
Byte‑stream : TCP presents data as a continuous stream rather than discrete datagrams.
Three‑Way Handshake
The handshake establishes a reliable connection by confirming both sending and receiving capabilities.
Client sends SYN → server enters SYN‑SENT.
Server replies with SYN + ACK → server enters SYN‑RCVD.
Client sends final ACK → both sides reach ESTABLISHED.
Each SYN consumes a sequence number; the ACK does not.
Why three handshakes?
Two handshakes cannot verify the client’s ability to receive data, which may waste resources if a delayed SYN arrives after the client has closed. Four handshakes add unnecessary overhead; three handshakes provide sufficient confirmation of both send and receive capabilities.
Data during handshake
Only the third handshake may safely carry data; the first two are reserved for control messages to avoid security risks.
Simultaneous Open
If both sides send SYN simultaneously, each enters SYN‑SENT. Upon receiving the peer’s SYN, both transition to SYN‑RCVD and then exchange ACK+SYN to reach ESTABLISHED.
Four‑Way Termination
Client sends FIN → FIN‑WAIT‑1 (half‑close).
Server acknowledges with ACK → CLOSE‑WAIT.
Client receives ACK → FIN‑WAIT‑2.
Server sends its own FIN → LAST‑ACK.
Client receives FIN → TIME‑WAIT and sends final ACK, waiting 2 MSL before closing.
Waiting 2 MSL ensures delayed packets from the old connection do not interfere with new connections.
Half‑Open Queue and SYN Flood
Before the handshake, the server maintains two queues:
SYN queue (half‑open) : Holds connections that have received a SYN but not completed the handshake.
Accept queue (full‑open) : Holds fully established connections awaiting application acceptance.
A SYN Flood overwhelms the SYN queue by sending massive forged SYN packets, exhausting resources and causing denial‑of‑service.
Mitigation Strategies
Increase the size of the SYN queue.
Reduce the number of SYN+ACK retransmissions.
Deploy SYN cookies: the server returns a cryptographic cookie in the SYN‑ACK without allocating resources; only after the client returns a valid ACK with the cookie does the server allocate the connection.
TCP Header Fields
Typical TCP header (bytes): source port, destination port, sequence number, acknowledgment number, data offset, flags (SYN, ACK, FIN, RST, PSH), window size (with optional scaling), checksum, urgent pointer, and optional fields such as Timestamp, MSS, SACK, and Window Scale.
TCP Fast Open (TFO)
TFO reduces latency by allowing data in the initial SYN exchange.
Client sends SYN with a Fast Open cookie request.
Server computes a SYN cookie, places it in the Fast Open option, and replies with SYN‑ACK.
Client caches the cookie; on the next handshake it sends SYN + cached cookie + application data (e.g., HTTP request).
Server validates the cookie and can immediately return a response before the three‑way handshake completes.
This saves one RTT for data transmission.
TCP Timestamps
The 10‑byte optional field consists of Timestamp and Timestamp Echo (each 4 bytes).
Accurate RTT measurement : Sender records its send time ( ta1) in Timestamp; receiver echoes it back in Timestamp Echo. RTT = ta2 – ta1 where ta2 is the receiver’s arrival time.
Sequence‑number wrap‑around detection : Even if sequence numbers repeat, differing timestamps uniquely identify packets.
Retransmission Timeout (RTO) Calculation
Classic Method
Maintain a smoothed RTT ( SRTT): SRTT = α·SRTT + (1‑α)·RTT Typical α ≈ 0.8‑0.9.
Compute RTO: RTO = min(ubound, max(lbound, β·SRTT)) β ≈ 1.3‑2.0; lbound and ubound are lower and upper limits.
Jacobson/Karels (Standard) Method
Update SRTT (α = 1/8): SRTT = (1‑α)·SRTT + α·RTT Update RTT variance ( RTTVAR, β = 0.25): RTTVAR = (1‑β)·RTTVAR + β·|RTT‑SRTT| Compute RTO: RTO = μ·SRTT + δ·RTTVAR with μ = 1 and δ = 4.
Flow Control
TCP uses a sliding window advertised by the receiver. The effective sending window is min(rwnd, cwnd). When the receiver’s buffer fills, it reduces the advertised window, causing the sender to throttle transmission.
Congestion Control
Two core variables per connection:
cwnd (congestion window) – limits the amount of data the sender may have in flight.
ssthresh (slow‑start threshold) – determines the transition from exponential to linear growth.
Algorithms
Slow Start : cwnd doubles each RTT until it reaches ssthresh.
Congestion Avoidance : cwnd increases by roughly one MSS per RTT (additive increase).
Fast Retransmit : After three duplicate ACKs, the missing segment is retransmitted immediately.
Selective Acknowledgment (SACK) : Receiver informs sender which blocks were received, allowing targeted retransmission.
Fast Recovery : After fast retransmit, set ssthresh = cwnd/2, reduce cwnd to ssthresh, then grow linearly.
Nagle Algorithm and Delayed ACK
Nagle coalesces small outgoing segments: after the first small segment, subsequent data is sent only when the accumulated data reaches the MSS or all previous data has been acknowledged.
Delayed ACK waits up to 200‑500 ms to combine ACKs for multiple received segments, reducing ACK traffic. Immediate ACKs are required for large segments, quick‑ack mode, or out‑of‑order packets.
TCP Keep‑Alive
Keep‑alive probes detect dead connections. Linux defaults (viewable via sysctl) are:
net.ipv4.tcp_keepalive_time = 7200 # seconds between probes
net.ipv4.tcp_keepalive_probes = 9 # max probe attempts
net.ipv4.tcp_keepalive_intvl = 75 # seconds between successive probesMany applications disable keep‑alive because the default interval (2 hours) is too long for typical use cases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
