Understanding TCP: Basics, Handshakes, Flags, and Performance Tuning
This article explains TCP fundamentals—including connection-oriented design, byte‑stream transmission, reliability, header structure, flag meanings, the three‑way handshake, four‑way termination, SYN‑Flood attacks, time‑wait handling, retransmission strategies, kernel tuning parameters, Nagle’s algorithm, and congestion control mechanisms such as slow start and congestion avoidance.
TCP Fundamentals
TCP (Transmission Control Protocol) is a connection‑oriented, byte‑stream, reliable transport‑layer protocol. It guarantees that data sent by the sender reaches the receiver in order and without loss.
TCP Header and Flags
The header contains fields such as sequence number, acknowledgment number, and several flags that control connection behavior:
URG : indicates that the urgent pointer is valid.
ACK : marks the segment as an acknowledgment.
PSH : tells the receiver to push the data to the application immediately.
RST : resets the connection.
SYN : initiates a new connection.
FIN : gracefully closes a connection.
Three‑Way Handshake (Connection Establishment)
The client and server start in the CLOSED state. The server listens on a port (LISTEN). The client sends a SYN with an initial sequence number; the server replies with SYN‑ACK containing its own sequence number and the client’s sequence+1 as acknowledgment; the client sends an ACK, and both sides enter the ESTABLISHED state.
SYN‑Flood Attack and Countermeasures
When a server receives a SYN and sends SYN‑ACK but never gets the final ACK, the connection remains half‑open. Linux retries SYN‑ACK up to five times with exponential back‑off (1 s, 2 s, 4 s, 8 s, 16 s), totaling 63 s before the socket is dropped. Attackers exploit this by sending many SYNs and then disappearing, exhausting the SYN backlog.
Mitigation options include adjusting kernel parameters such as tcp_synack_retries (reduce retries), tcp_max_syn_backlog (increase backlog size), and tcp_abort_on_overflow (drop excess connections).
Four‑Way Handshake (Connection Termination)
Either side can close the connection by sending a FIN. The sequence of states is FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED on the active closer, while the passive side moves through CLOSE_WAIT → LAST_ACK → CLOSED. TIME_WAIT lasts for 2 MSL (Maximum Segment Lifetime) to ensure delayed packets do not interfere with new connections.
TIME_WAIT Management
Excessive TIME_WAIT sockets consume resources, especially under high‑concurrency short‑lived connections. Linux provides tunables such as tcp_tw_reuse, tcp_tw_recycle, and tcp_max_tw_buckets to reuse or recycle sockets and limit the number of TIME_WAIT entries.
TCP Retransmission Mechanisms
TCP guarantees delivery by retransmitting lost segments. Two mechanisms are used:
Timeout retransmission : If an ACK is not received before the retransmission timeout, the sender resends the missing segment (and possibly subsequent ones).
Fast retransmission : When the sender receives three duplicate ACKs for the same sequence number, it immediately retransmits the presumed lost segment without waiting for the timeout.
TCP Kernel Tuning Parameters (Linux)
# Set the maximum number of TIME_WAIT sockets (default 180000)
net.ipv4.tcp_max_tw_buckets = 20000
# Enable fast recycling of TIME_WAIT sockets
net.ipv4.tcp_tw_recycle = 1
# Allow reuse of TIME_WAIT sockets for new connections
net.ipv4.tcp_tw_reuse = 1
# Keep‑alive interval (seconds)
net.ipv4.tcp_keepalive_time = 100
# Number of keep‑alive probes before dropping
net.ipv4.tcp_keepalive_probes = 9
# Interval between keep‑alive probes (seconds)
net.ipv4.tcp_keepalive_intvl = 75
# Number of SYN‑ACK retries before giving up
net.ipv4.tcp_synack_retries = 1
# Maximum size of the SYN backlog queue
net.ipv4.tcp_max_syn_backlog = 8192
# Drop connections when the backlog overflows
tcp_abort_on_overflow = 0TCP Algorithms: Nagle and Congestion Control
Nagle algorithm coalesces small packets until an ACK is received, improving bandwidth utilization. It can be disabled by setting the socket option TCP_NODELAY via
setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, (char*)&value, sizeof(int)).
Congestion control relies on the congestion window (cwnd) and operates in two phases:
Slow start : cwnd starts at 1 MSS and grows exponentially each RTT (cwnd = cwnd × 2) until it reaches the slow‑start threshold (ssthresh).
Congestion avoidance : once cwnd ≥ ssthresh, cwnd increases linearly (cwnd = cwnd + 1/cwnd per ACK, or cwnd + 1 per RTT), preventing excessive growth that could cause network congestion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Thoughts on Knowledge and Action
Travel together, with knowledge and action all the way
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
