Deep Dive into the TCP Three‑Way Handshake from the Linux Kernel Perspective
The article explains how the Linux kernel implements the TCP three‑way handshake, detailing server listen queue allocation, client connect port selection and SYN transmission, SYN‑ACK processing, state transitions, request‑socket management, full‑connection queue handling, and the final accept call, plus tuning tips.
This article provides an in‑depth, Linux‑kernel‑level explanation of the TCP three‑way handshake, covering the server listen , client connect , the exchange of SYN, SYN‑ACK and ACK packets, and the final accept call. It goes beyond the textbook state diagram to reveal how the kernel manages port selection, half‑ and full‑connection queues, memory allocation, timers, and socket state transitions.
1. Server listen
The server creates a socket, binds a port, calls listen(fd, 128) , and then waits for incoming connections. Internally the kernel executes reqsk_queue_alloc() to allocate the half‑connection queue, compute its size, allocate memory (using vzalloc or kzalloc ), and initialise the full‑connection queue head:
int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries) {
size_t lopt_size = sizeof(struct listen_sock);
// calculate half‑queue length
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
// allocate memory for half‑queue
lopt_size += nr_table_entries * sizeof(struct request_sock *);
if (lopt_size > PAGE_SIZE)
lopt = vzalloc(lopt_size);
else
lopt = kzalloc(lopt_size, GFP_KERNEL);
queue->rskq_accept_head = NULL; // full‑queue head
lopt->nr_table_entries = nr_table_entries;
queue->listen_opt = lopt;
...
}The half‑queue is a hash table that stores pending SYN requests; the full‑queue holds established sockets.
2. Client connect
The client creates a socket and calls connect(fd, …) . The kernel runs tcp_v4_connect() which sets the socket state to TCP_SYN_SENT , selects an available source port, builds a SYN packet and starts a retransmission timer:
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) {
tcp_set_state(sk, TCP_SYN_SENT);
err = inet_hash_connect(&tcp_death_row, sk); // port selection
err = tcp_connect(sk); // build and send SYN
}The actual packet construction happens in tcp_connect() :
void tcp_connect(struct sock *sk) {
tcp_connect_init(sk);
// allocate skb and build SYN packet
tcp_connect_queue_skb(sk, buff);
err = tcp_transmit_skb(sk, buff, 1, sk->sk_allocation);
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, inet_csk(sk)->icsk_rto, TCP_RTO_MAX);
}At this point the client is waiting for a SYN‑ACK.
3. Server processes the SYN (first handshake)
Incoming packets are received by the NIC, trigger a soft‑irq and enter tcp_v4_rcv() , which forwards them to tcp_v4_do_rcv() . If the listening socket is in TCP_LISTEN state, the kernel calls tcp_v4_hnd_req() to look up the half‑connection queue. Because the queue is empty, the request is simply passed on.
The core logic for handling a SYN request lives in tcp_v4_conn_request() :
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) {
if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
want_cookie = tcp_syn_flood_action(sk, skb, "TCP");
if (!want_cookie) goto drop;
}
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
goto drop;
}
// allocate request_sock
req = inet_reqsk_alloc(&tcp_request_sock_ops);
// build SYN‑ACK
skb_synack = tcp_make_synack(sk, dst, req, NULL);
err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr, ireq->rmt_addr, ireq->opt);
inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
...
}If the half‑queue is full and tcp_syncookies is disabled, the SYN is dropped; otherwise a SYN‑ACK is sent and the request is added to the half‑queue.
4. Client processes the SYN‑ACK (second handshake)
The client receives the SYN‑ACK and again goes through tcp_v4_rcv() . Because its socket state is TCP_SYN_SENT , the code path enters the TCP_SYN_SENT case of tcp_rcv_state_process() and calls tcp_rcv_synsent_state_process() :
int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len) {
tcp_ack(sk, skb, FLAG_SLOWPATH);
tcp_finish_connect(sk, skb);
if (sk->sk_write_pending || icsk->icsk_accept_queue.rskq_defer_accept || icsk->icsk_ack.pingpong)
// delayed ACK handling
else
tcp_send_ack(sk);
}During this step the retransmission timer set by the client is cancelled, the socket state is changed to TCP_ESTABLISHED , congestion control is initialised, and a keep‑alive timer may be started:
void tcp_finish_connect(struct sock *sk, struct sk_buff *skb) {
tcp_set_state(sk, TCP_ESTABLISHED);
tcp_init_congestion_control(sk);
if (sock_flag(sk, SOCK_KEEPOPEN))
inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
}The client finally sends the third ACK.
5. Server processes the ACK (third handshake)
The ACK arrives at the server, again passes through tcp_v4_do_rcv() . This time the socket is in TCP_SYN_RECV (half‑connection state). The kernel calls tcp_child_process() which ultimately invokes tcp_rcv_state_process() and sets the socket to TCP_ESTABLISHED :
case TCP_SYN_RECV:
tcp_set_state(sk, TCP_ESTABLISHED);
...At the same time the half‑connection request_sock is removed from the half‑queue, a new child socket is created (via tcp_v4_syn_recv_sock() ), and the child is added to the full‑connection queue:
struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) {
if (sk_acceptq_is_full(sk))
goto exit_overflow;
newsk = tcp_create_openreq_child(sk, req, skb);
...
}Finally the request_sock is unlinked from the half‑queue and linked into the full‑connection queue:
static void inet_csk_reqsk_queue_unlink(struct sock *sk, struct request_sock *req, struct request_sock **prev) {
reqsk_queue_unlink(&inet_csk(sk)->icsk_accept_queue, req, prev);
}
static void inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req, struct sock *child) {
reqsk_queue_add(&inet_csk(sk)->icsk_accept_queue, req, sk, child);
}6. accept()
The application calls accept() . The kernel simply removes the first entry from the full‑connection queue and returns the newly created child socket to user space:
struct sock *inet_csk_accept(struct sock *sk, int flags, int *err) {
struct request_sock_queue *queue = &icsk->icsk_accept_queue;
req = reqsk_queue_remove(queue);
newsk = req->sk;
return newsk;
}
static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue) {
struct request_sock *req = queue->rskq_accept_head;
queue->rskq_accept_head = req->dl_next;
if (queue->rskq_accept_head == NULL)
queue->rskq_accept_tail = NULL;
return req;
}7. Summary and practical notes
During listen the kernel calculates and allocates memory for half‑ and full‑connection queues.
During connect the client sets TCP_SYN_SENT , selects a port, sends a SYN and starts a retransmission timer.
The server’s SYN handling checks queue limits, may invoke syncookies, builds a SYN‑ACK and enqueues the request.
The client’s SYN‑ACK handling cancels the timer, moves to TCP_ESTABLISHED , and sends the final ACK.
The server’s ACK handling removes the half‑connection, creates a child socket, adds it to the full‑connection queue, and marks it TCP_ESTABLISHED .
accept() simply dequeues the first established socket.
If any queue overflows and syncookies are disabled, the handshake packet is dropped, leading to retransmission delays (1 s, 2 s, 4 s … in modern kernels, 3 s in older kernels). Retransmission limits are controlled by tcp_syn_retries and tcp_synack_retries .
Understanding these kernel‑level details helps diagnose occasional latency spikes caused by handshake retransmissions and informs performance‑tuning of server socket parameters.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.