Fundamentals 11 min read

Mastering TCP Congestion Control: State Machines and Core Algorithms Explained

This article provides an in‑depth overview of TCP congestion control, covering the sliding‑window flow control, the congestion control state machine with its five states, and the four core algorithms—slow start, congestion avoidance, fast retransmit, and fast recovery—plus practical insights for middleware design.

Programmer DD

Apr 1, 2019

Mastering TCP Congestion Control: State Machines and Core Algorithms Explained

Recently I spent time studying the TCP/IP protocol because my understanding was limited to the three-way handshake and four-way termination, and I wanted a deeper grasp of its mechanisms. TCP/IP concepts also inspire middleware architecture, such as applying TCP congestion control algorithms for rate‑limiting.

TCP includes two major control mechanisms: flow control, which uses a sliding window to match the sender’s rate to the receiver’s capacity, and congestion control, which prevents the network from becoming overloaded by limiting the total amount of data in flight.

Congestion Control State Machine

Like TCP itself, congestion control operates through a state machine. When the sender receives an ACK, Linux TCP decides whether to reduce, maintain, or increase the congestion window (cwnd) based on the current state.

Open State

Open is the default state. When an ACK arrives, the sender checks whether cwnd is smaller or larger than the slow‑start threshold (ssthresh) and applies either the slow‑start or congestion‑avoidance algorithm accordingly.

Disorder State

If the sender detects duplicate ACKs (DACK) or selective ACKs (SACK), it transitions to Disorder. In this state the sender follows the in‑flight packet conservation principle: a new packet is sent only after an older packet has left the network (i.e., after its ACK is received).

CWR State

When a congestion‑notification flag is received, the sender does not immediately cut cwnd; instead, it reduces cwnd by one segment for every two ACKs until the window size is halved. This CWR (Congestion Window Reduced) state can later transition to Recovery or Loss.

Recovery State

After receiving enough duplicate ACKs (typically three), the sender enters Recovery. In this state cwnd is reduced by one segment for every two ACKs until it equals ssthresh (half of the cwnd at entry). Recovery ends when all data sent during the state are acknowledged, after which the sender returns to Open. A retransmission timeout can interrupt Recovery and move the sender to Loss.

Loss State

When a retransmission timeout (RTO) expires, the sender enters Loss. All in‑flight data are marked lost, cwnd is set to one segment, and the sender restarts slow start. Unlike Recovery, cwnd can only increase in Loss, and the state persists until all packets sent during Loss are successfully acknowledged, after which the sender returns to Open.

Four Core Algorithms

Slow Start

At connection establishment cwnd starts at one MSS. Each ACK increments cwnd by one (linear growth). After each round‑trip time (RTT) cwnd doubles (exponential growth) until it reaches ssthresh, at which point the algorithm switches to Congestion Avoidance.

Congestion Avoidance

When cwnd ≥ ssthresh, the sender increases cwnd more cautiously: on each ACK, cwnd = cwnd + 1/cwnd, and it adds one to cwnd each RTT, preventing rapid window growth that could cause congestion.

Congestion Occurrence (Fast Retransmit)

TCP treats packet loss as a sign of congestion. Loss can be detected either by an RTO expiration or by receiving three duplicate ACKs. The latter triggers Fast Retransmit, which retransmits the missing segment without waiting for a timeout and then switches to Fast Recovery.

When loss is detected, ssthresh is set to cwnd/2.

cwnd is reset to 1.

The sender re‑enters Slow Start.

Early TCP Tahoe used this approach, but resetting cwnd to 1 after any loss harms throughput. TCP Reno improves this by halving cwnd and ssthresh, then entering Fast Recovery.

Fast Recovery

During Fast Recovery, cwnd is set to cwnd + 3 MSS (because three duplicate ACKs were received). The sender retransmits the lost segment(s). For each additional duplicate ACK, cwnd grows by one MSS. When a new ACK arrives, indicating successful retransmission, cwnd is set to ssthresh and the sender returns to Congestion Avoidance.

cwnd = cwnd + 3 MSS.

Retransmit the indicated segment(s).

Increase cwnd by one MSS for each further duplicate ACK.

On new ACK, set cwnd = ssthresh and switch to Congestion Avoidance.

Postscript

The mechanisms described above still have limitations, and the industry continues to develop newer algorithms such as Google’s BBR. Future articles will explore these advancements.

References

Congestion Control in Linux TCP

TCP BBR算法与Reno/CUBIC的对比

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TCP Linux algorithms Network Protocols Congestion Control

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.