Fundamentals 35 min read

Mastering TCP: Handshakes, Flow & Congestion Control, Fast Open and More

This comprehensive guide explains TCP vs UDP, the three‑way handshake, four‑way termination, half‑open queues, SYN‑Flood attacks, header fields, timestamps, Fast Open, retransmission timeout calculations, flow control, congestion control, Nagle’s algorithm, delayed ACKs and keep‑alive mechanisms, providing essential knowledge for networking interviews and system design.

Open Source Linux
Open Source Linux
Open Source Linux
Mastering TCP: Handshakes, Flow & Congestion Control, Fast Open and More

001. What are the differences between TCP and UDP?

First, the basic differences:

TCP is a connection‑oriented, reliable, byte‑stream transport‑layer protocol.

In contrast, UDP is a connection‑less transport‑layer protocol. (That’s all.)

Compared with UDP, TCP has three core features:

Connection‑oriented . Before communication, TCP performs a three‑way handshake to establish a connection, while UDP has no such process.

Reliability . TCP invests heavily in ensuring reliability, which includes statefulness and controllability.

Byte‑stream orientation . UDP transmits datagrams, inheriting IP characteristics; TCP converts IP packets into a byte stream to maintain state.

TCP precisely records which data have been sent, received, or lost, and guarantees in‑order delivery—this is stateful .

When packet loss or poor network conditions are detected, TCP adjusts its behavior, controlling sending speed or retransmitting—this is controllable .

Conversely, UDP is stateless and uncontrollable.

002. How does the TCP three‑way handshake work and why three times?

Love‑simulation analogy

Using a romance metaphor, the handshake confirms each side’s ability to love and be loved . The steps are:

First:

Man: I love you.

Woman receives it, proving the man can send love.

Second:

Woman: I received your love and I love you too.

Man receives it, showing the woman has both love and being loved abilities.

Third:

Man: I received your love.

Woman receives it, confirming the man now has being loved ability.

Both parties now have love and being loved and start a sweet relationship.

Real handshake

The actual TCP handshake confirms the two abilities: sending capability and receiving capability. The process:

Both start in CLOSED. The server begins listening ( LISTEN).

The client sends SYN and moves to SYN‑SENT.

The server replies with SYN and ACK, entering SYN‑REVD.

The client then sends ACK, reaching ESTABLISHED; the server also reaches ESTABLISHED.

Any packet that requires the peer’s acknowledgment consumes a TCP sequence number.

Thus SYN consumes a sequence number, while ACK does not.

Why not two?

Two handshakes cannot confirm the client’s receiving ability. If the first SYN is delayed in the network, the client may retransmit, establishing a connection that the server believes is ready while the client has already closed, wasting resources.

Why not four?

Four handshakes are unnecessary; three are sufficient to confirm both sides’ capabilities.

Can data be carried during the handshake?

Only the third handshake can carry data; the first two cannot. Allowing data in the first two would let attackers embed large payloads in SYN, forcing the server to allocate time and memory, increasing attack surface.

Simultaneous open

If both sides send SYN simultaneously, the state transitions are:

Both enter SYN‑SENT, then SYN‑REVD, and finally ESTABLISHED after exchanging ACK+SYN.

003. What is the TCP four‑way termination process?

Step‑by‑step

Both start in ESTABLISHED. The client sends FIN, entering FIN‑WAIT‑1 (half‑close).

The server acknowledges, moving to CLOSED‑WAIT. The client receives the ACK and moves to FIN‑WAIT‑2.

The server then sends its own FIN, entering LAST‑ACK. The client receives this FIN, enters TIME‑WAIT, and sends an ACK.

Why wait 2 MSL?

If the client exits immediately while the server still has data in flight, the new application may receive stray packets, causing confusion. Waiting two Maximum Segment Lifetimes (2 MSL) ensures all stray packets have expired before the port is reused.

One MSL guarantees the final ACK reaches the peer.

One MSL guarantees any retransmitted FIN reaches the peer.

Thus the 2 MSL wait is essential.

Why four instead of three?

The server cannot send FIN immediately after receiving the client’s FIN because it must finish sending all pending data. It first acknowledges the client’s FIN, then later sends its own FIN, resulting in four steps.

Simultaneous close

If both sides send FIN at the same time, the state diagram is:

004. How do half‑open queues relate to SYN‑Flood attacks?

Before the three‑way handshake, the server moves from CLOSED to LISTEN and creates two internal queues: the half‑open (SYN) queue and the full‑open (ACCEPT) queue.

Half‑open queue

When a client sends SYN, the server replies with SYN + ACK, moving to SYN_RCVD. The connection is placed in the SYN queue (half‑open).

Full‑open queue

After the client sends the final ACK, the handshake completes and the connection is moved to the ACCEPT queue, awaiting acceptance by the application.

SYN‑Flood attack

Attackers forge many non‑existent IPs and flood the server with SYN packets. This causes:

Many connections stuck in SYN_RCVD, filling the half‑open queue and preventing legitimate connections.

Since the forged IPs never send ACK, the server repeatedly retransmits, exhausting resources.

Mitigations

Increase the size of the SYN queue.

Reduce the number of SYN+ACK retransmissions.

Use SYN cookies: the server does not allocate resources until the client returns a valid cookie in the ACK.

005. What are the fields of a TCP segment header?

The header layout (bytes) is shown below:

Remember this diagram!

Source and destination ports

A TCP connection is uniquely identified by the four‑tuple: source IP, source port, destination IP, destination port. IP addresses are handled at the IP layer; TCP only records the ports.

Sequence number

The Sequence number is a 32‑bit unsigned integer indicating the first byte of the segment. It wraps from 2³²‑1 to 0.

Roles:

Exchange initial sequence numbers during the SYN exchange.

Ensure correct ordering of received data.

Initial Sequence Number (ISN)

ISN is not fixed; it increments every 4 ms, making prediction difficult. Predictable ISNs would allow attackers to forge RST packets and terminate connections.

Acknowledgment number

The ACK field tells the peer the next expected sequence number; all bytes below this number have been received.

Flags

Common flags: SYN, ACK, FIN, RST, PSH. FIN: sender wishes to close the connection. RST: forcefully reset the connection. PSH: push data to the receiving application immediately.

Window size

The 16‑bit window field can be scaled using the Window Scale option (factor 0‑14) to represent larger windows as 2ⁿ multiples.

Checksum

A 16‑bit checksum protects against data corruption; corrupted segments are discarded and retransmitted.

Options

Option format:

Common options:

Timestamp – used for RTT measurement and PAWS.

MSS – maximum segment size the peer can receive.

SACK – selective acknowledgment.

Window Scale – expands the window field.

006. How does TCP Fast Open (TFO) work?

Standard three‑way handshakes add latency. TFO reduces this by allowing data in the second handshake.

TFO process

Initial three‑way handshake

The client sends SYN. The server does not reply with SYN+ACK immediately; instead it computes a SYN Cookie and places it in the Fast Open option, then returns the packet.

The client caches the cookie for future use.

Subsequent handshakes

The client sends the cached cookie, SYN, and the HTTP request together. If the server validates the cookie, it returns SYN+ACK and can already send the HTTP response before the final ACK arrives.

Advantages of TFO

By validating the cookie, the server can send data after only one RTT, improving latency for short transactions.

007. What is the purpose of the TCP timestamp option?

The timestamp option occupies 10 bytes: kind=8, length=10, followed by timestamp and timestamp echo (4 bytes each).

RTT calculation

When a segment s1 is sent, the sender records its kernel time ta1 in the timestamp field. The receiver replies with timestamp=tb and timestamp echo=ta1. Upon receiving the reply, the sender computes RTT as ta2 - ta1, where ta2 is the current kernel time.

Preventing sequence‑number wrap‑around

When sequence numbers wrap, two packets may share the same number. The timestamp, being unique per transmission, distinguishes them, avoiding ambiguity.

008. How is TCP retransmission timeout (RTO) calculated?

RTO is derived from RTT measurements. Two methods exist:

Classic method

Maintain a smoothed RTT (SRTT): SRTT = α·SRTT + (1‑α)·RTT (α≈0.8‑0.9). Then compute RTO = min(ubound, max(lbound, β·SRTT)) where β≈1.3‑2.0.

Standard (Jacobson/Karels) method

Update SRTT with α=1/8: SRTT = (1‑α)·SRTT + α·RTT.

Compute RTT variance: RTTVAR = (1‑β)·RTTVAR + β·|RTT‑SRTT| (β≈0.25).

Finally, RTO = μ·SRTT + δ·RTTVAR with μ=1, δ=4. This method reacts faster to RTT changes.

009. How does TCP flow control work?

Flow control uses the receiver’s advertised window to limit the sender’s transmission rate. The sender’s effective window is min(rwnd, cwnd), where rwnd is the receiver’s window and cwnd is the congestion window.

Sliding window

The sender maintains variables: SND.UNA (oldest unacknowledged byte), SND.NXT (next byte to send), SND.WND (sender window size). The receiver maintains RCV.NXT (next expected byte) and RCV.WND (available buffer).

When the receiver’s buffer fills, it reduces RCV.WND in ACKs, causing the sender to shrink its sending window accordingly.

010. How does TCP congestion control work?

Congestion control limits the amount of data the sender injects into the network, independent of the receiver’s window. It uses two state variables: cwnd – congestion window. ssthresh – slow‑start threshold.

Slow start

Initially, cwnd grows exponentially: each ACK increases cwnd by one MSS, doubling each RTT until cwnd reaches ssthresh.

Congestion avoidance

After reaching ssthresh, growth becomes linear: each ACK increases cwnd by 1/cwnd, resulting in an increase of roughly one MSS per RTT.

Fast retransmit and fast recovery

When three duplicate ACKs are received, the sender assumes a packet loss and retransmits immediately (fast retransmit) without waiting for RTO.

Selective acknowledgment (SACK) can be used to indicate which blocks have been received, so only missing segments are retransmitted.

During fast recovery, ssthresh is set to half of the current cwnd, cwnd is set to ssthresh, and then grows linearly.

011. What are Nagle’s algorithm and delayed ACK?

Nagle’s algorithm

To avoid sending many tiny packets, Nagle buffers data until either:

The buffered data reaches the MSS, or

All previously sent data have been acknowledged.

The first small segment is sent immediately; subsequent small writes are coalesced.

Delayed ACK

The receiver may wait briefly (≤ 500 ms, typically ≤ 200 ms) before sending an ACK, allowing it to acknowledge multiple received segments with a single ACK.

Exceptions (no delay): large segment that changes the window, quick‑ack mode, or out‑of‑order packets.

Interaction

When both Nagle and delayed ACK are enabled, small writes can experience additional latency, potentially harming performance.

012. How does TCP keep‑alive work?

TCP keep‑alive probes detect dead connections when no data is exchanged. The default Linux settings are:

net.ipv4.tcp_keepalive_time = 7200   # seconds of idle time before first probe
net.ipv4.tcp_keepalive_intvl = 75    # interval between probes
net.ipv4.tcp_keepalive_probes = 9    # number of probes before declaring the connection dead

Because the default idle interval is two hours, many applications keep keep‑alive disabled; a shorter interval would defeat the purpose of detecting only long‑lived dead connections.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TCPSecurityprotocolFlow Controlcongestion controlHandshake
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.