Fundamentals 29 min read

Understanding TCP: Handshakes, Flow & Congestion Control Explained

This article provides a comprehensive overview of TCP fundamentals, comparing it with UDP, detailing the three‑way handshake and four‑way termination, explaining SYN flood attacks, describing the TCP header fields, fast open, timestamps, retransmission timeout calculations, flow and congestion control mechanisms, as well as Nagle's algorithm, delayed ACKs, and keep‑alive behavior.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Understanding TCP: Handshakes, Flow & Congestion Control Explained

TCP vs UDP

TCP is a connection‑oriented, reliable, byte‑stream transport protocol. UDP is connection‑less and provides no reliability guarantees. The three core differences are:

Connection orientation – TCP establishes a connection before data exchange; UDP does not.

Reliability – TCP maintains state, acknowledges received data and retransmits lost segments; UDP is stateless and does not retransmit.

Byte‑stream handling – TCP presents a continuous stream of bytes, while UDP preserves message boundaries (datagrams).

Three‑Way Handshake

The handshake creates a bidirectional communication channel and verifies both send and receive capabilities.

SYN : client sends a SYN segment, moves to SYN‑SENT.

SYN‑ACK : server replies with SYN+ACK, moves to SYN‑RECEIVED.

ACK : client acknowledges with ACK, both sides enter ESTABLISHED.

Why three handshakes?

Two handshakes cannot confirm the client’s ability to receive data; the third ACK guarantees that the client’s receive window is functional. Four handshakes would add unnecessary overhead.

Data transfer during handshake

Only the third segment (the ACK) may carry payload data. The first two segments must remain empty to avoid amplification attacks.

Simultaneous Open

If both peers send SYN at the same time, each enters SYN‑SENT. Upon receiving the peer’s SYN they transition to SYN‑RECEIVED, then exchange ACK+SYN and both reach ESTABLISHED.

Four‑Way Termination

The termination sequence proceeds through the states: FIN‑WAIT‑1 – initiator sends FIN. FIN‑WAIT‑2 – after receiving ACK for its FIN. TIME‑WAIT – after receiving the peer’s FIN and sending final ACK; the side waits for 2 MSL (Maximum Segment Lifetime) to ensure all delayed packets are discarded. CLOSED – final state.

SYN Flood and Half‑Open Queue

Before the handshake, a listening server maintains two queues:

SYN (half‑open) queue – stores connections in SYN‑RECEIVED after the server replies with SYN‑ACK.

ACCEPT (full‑open) queue – holds fully established connections awaiting accept().

A SYN flood floods the half‑open queue with bogus SYNs, causing the server to allocate resources and repeatedly retransmit ACKs to non‑existent IPs, exhausting memory and CPU. Mitigations include enlarging the SYN queue, reducing SYN‑ACK retry count, and deploying SYN cookies.

TCP Header Fields

Typical TCP header (bytes):

Source Port (2) | Destination Port (2) | Sequence Number (4) | Acknowledgment Number (4) | Data Offset (4 bits) + Reserved (3) + Flags (9) | Window Size (2) | Checksum (2) | Urgent Pointer (2) | Options (variable) | Data (variable)

Key flags: SYN, ACK, FIN, RST, PSH. Options commonly used: Timestamp, MSS, SACK, Window Scale.

TCP Fast Open (TFO)

TFO reduces latency by allowing data to be sent in the initial SYN‑ACK exchange.

Client sends SYN with a Fast‑Open cookie request.

Server computes a SYN Cookie, places it in the Fast Open option of the SYN‑ACK, and returns it.

Client caches the cookie. In the next connection attempt it sends SYN + cached cookie + application data (e.g., HTTP request).

Server validates the cookie; if valid it can immediately send application data before the final ACK.

The advantage is that data can be transmitted after only one RTT instead of the usual three‑way handshake latency.

TCP Timestamp Option

The 10‑byte option format:

Kind (1) = 8 | Length (1) = 10 | TSval (4) | TSecr (4)

Uses:

Accurate RTT measurement: RTT = ta2 – ta1, where ta1 is the sender’s timestamp in the data segment and ta2 is the receiver’s timestamp when the ACK arrives.

Protection against sequence‑number wrap‑around: each segment carries a unique timestamp, allowing the receiver to distinguish old packets that have the same sequence number after wrap‑around.

Retransmission Timeout (RTO) Calculation

Classic method

SRTT = α·SRTT + (1‑α)·RTT

Typical α = 0.8–0.9.

RTO = min(ubound, max(lbound, β·SRTT))
β

≈ 1.3–2.0, lbound and ubound are lower/upper limits.

Jacobson/Karels (standard) method

// Step 1 – Smoothed RTT
SRTT = (1‑α)·SRTT + α·RTT   // α = 1/8 ≈ 0.125
// Step 2 – RTT variation
RTTVAR = (1‑β)·RTTVAR + β·|RTT‑SRTT|   // β = 0.25
// Step 3 – RTO
RTO = μ·SRTT + δ·RTTVAR   // μ = 1, δ = 4

This algorithm reacts faster to RTT changes, improving timeout accuracy.

Flow Control

Flow control limits the sender based on the receiver’s advertised window ( rwnd). The effective sending window is: send_window = min(rwnd, cwnd) The receiver advertises rwnd in each ACK, reflecting the amount of free space in its receive buffer.

Congestion Control

TCP maintains two variables per connection: cwnd (congestion window) – amount of data the sender may have in flight, limited by network congestion. ssthresh (slow‑start threshold) – boundary between slow start and congestion avoidance.

Algorithms

Slow Start : on each ACK, cwnd += 1 MSS; thus cwnd doubles each RTT until it reaches ssthresh.

Congestion Avoidance : after ssthresh, increase cwnd by 1 MSS per RTT (i.e., cwnd += MSS·(MSS/cwnd)), which approximates cwnd += 1/cwnd per ACK.

Fast Retransmit : upon receiving three duplicate ACKs, the sender retransmits the missing segment immediately without waiting for RTO.

Selective Acknowledgment (SACK) : the receiver includes blocks of successfully received data in the ACK, allowing the sender to retransmit only missing segments.

Fast Recovery : after fast retransmit, set ssthresh = cwnd/2, cwnd = ssthresh + 3·MSS, then increase cwnd linearly for each additional duplicate ACK.

Nagle Algorithm and Delayed ACK

Nagle algorithm coalesces small outgoing segments: after the first small segment is sent, the sender buffers further data until either the previously sent data is acknowledged or enough data accumulates to fill an MSS.

Delayed ACK postpones sending an ACK for up to 200 ms (max 500 ms) to allow ACKs for multiple incoming segments to be combined. It must be disabled for large frames, when the socket is in TCP_QUICKACK mode, or when out‑of‑order packets are detected.

TCP Keep‑Alive

Keep‑alive probes detect dead peers on idle connections. Linux defaults (adjustable via sysctl) are:

net.ipv4.tcp_keepalive_time = 7200   # idle time before first probe (seconds)
net.ipv4.tcp_keepalive_intvl = 75    # interval between probes (seconds)
net.ipv4.tcp_keepalive_probes = 9   # number of unanswered probes before declaring the connection dead

If a probe receives no response after the configured number of attempts, the socket is closed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TCPFlow Controlcongestion controlHandshake
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.