Deep Dive into the TCP Three‑Way Handshake: Kernel Queues, Syncookies and Code Walkthrough
This article explains the complete kernel‑level implementation of the TCP three‑way handshake, covering server listen queue allocation, client connect state handling, SYN/SYN‑ACK processing, syncookie protection, timer management, socket creation, and the accept path, with detailed code examples.
In backend interview scenarios the TCP three‑way handshake is a frequent topic, but most answers only describe the high‑level state transitions. This article provides a low‑level kernel perspective, including half‑ and full‑connection queues, syncookies, retransmission timers, and other critical operations that can earn interview points.
1. Server listen
The kernel calculates the length of the half‑connection queue, allocates memory for it, and initializes both the half‑ and full‑connection queues. The listen_sock structure holds the queue metadata, and the queue is linked to the socket's accept queue.
int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries)
{
size_t lopt_size = sizeof(struct listen_sock);
struct listen_sock *lopt;
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
lopt_size += nr_table_entries * sizeof(struct request_sock *);
if (lopt_size > PAGE_SIZE)
lopt = vzalloc(lopt_size);
else
lopt = kzalloc(lopt_size, GFP_KERNEL);
queue->rskq_accept_head = NULL;
lopt->nr_table_entries = nr_table_entries;
queue->listen_opt = lopt;
...
}2. Client connect
The client creates a socket, sets its state to TCP_SYN_SENT , selects an available port via inet_hash_connect , builds a SYN packet and sends it, then starts a retransmission timer.
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
{
tcp_set_state(sk, TCP_SYN_SENT);
err = inet_hash_connect(&tcp_death_row, sk);
err = tcp_connect(sk);
}
int tcp_connect(struct sock *sk)
{
tcp_connect_init(sk);
/* allocate skb and build SYN */
...
tcp_connect_queue_skb(sk, buff);
err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) :
tcp_transmit_skb(sk, buff, 1, sk->sk_allocation);
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
inet_csk(sk)->icsk_rto, TCP_RTO_MAX);
}3. Server response to SYN
Incoming packets are processed by tcp_v4_do_rcv . When the listening socket receives a SYN, it looks up the half‑connection queue via tcp_v4_hnd_req . If the queue is empty, the request proceeds; otherwise the request may be dropped or handled with syncookies.
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
want_cookie = tcp_syn_flood_action(sk, skb, "TCP");
if (!want_cookie)
goto drop;
}
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
goto drop;
}
req = inet_reqsk_alloc(&tcp_request_sock_ops);
skb_synack = tcp_make_synack(sk, dst, req,
fastopen_cookie_present(&valid_foc) ? &valid_foc : NULL);
err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr,
ireq->rmt_addr, ireq->opt);
inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
...
}4. Client response to SYN‑ACK
When the client receives the SYN‑ACK while in TCP_SYN_SENT , tcp_rcv_synsent_state_process processes it, acknowledges the packet, clears the retransmission timer, sets the socket state to TCP_ESTABLISHED , initializes congestion control and, if needed, sends an ACK.
static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th, unsigned int len)
{
...
tcp_ack(sk, skb, FLAG_SLOWPATH);
tcp_finish_connect(sk, skb);
if (sk->sk_write_pending || icsk->icsk_ack.pingpong)
; // delayed ACK handling
else
tcp_send_ack(sk);
}
void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
{
tcp_set_state(sk, TCP_ESTABLISHED);
tcp_init_congestion_control(sk);
if (sock_flag(sk, SOCK_KEEPOPEN))
inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
}5. Server response to ACK (third handshake)
The ACK arrives at the listening socket, which now finds the matching request_sock in the half‑connection queue, creates a child socket, removes the request from the half‑queue, adds it to the full‑connection queue, and finally sets the state to TCP_ESTABLISHED .
struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
struct request_sock *req, struct request_sock **prev, bool fastopen)
{
child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL);
inet_csk_reqsk_queue_unlink(sk, req, prev);
inet_csk_reqsk_queue_removed(sk, req);
inet_csk_reqsk_queue_add(sk, req, child);
return child;
}
int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th, unsigned int len)
{
switch (sk->sk_state) {
case TCP_SYN_RECV:
tcp_set_state(sk, TCP_ESTABLISHED);
...
}
}6. accept path
The accept system call extracts the first socket from the full‑connection queue and returns it to user space.
struct sock *inet_csk_accept(struct sock *sk, int flags, int *err)
{
struct request_sock_queue *queue = &icsk->icsk_accept_queue;
req = reqsk_queue_remove(queue);
newsk = req->sk;
return newsk;
}
static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue)
{
struct request_sock *req = queue->rskq_accept_head;
queue->rskq_accept_head = req->dl_next;
if (queue->rskq_accept_head == NULL)
queue->rskq_accept_tail = NULL;
return req;
}In summary, the three‑way handshake involves more than simple state changes: the server allocates and manages half‑ and full‑connection queues, the client handles port selection and retransmission timers, syncookies protect against SYN floods, and the final accept pulls a fully established socket from the queue.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.