Why Does TCP Send‑Q Exceed SO_SNDBUF? Inside the Linux Kernel’s Buffer Mechanics
This article explains why the TCP Send‑Q can grow beyond the user‑set SO_SNDBUF value by examining the kernel’s double‑buffer accounting, sk_wmem_queued handling, and the interaction of tcp_sendmsg with GSO, supported by code excerpts and diagrams.
Recently I encountered a problem with a simplified model: a client creates a TCP socket, sets the SO_SNDBUF option to 4096 bytes, and after connecting to a server that never calls recv(), it sends a 1024‑byte segment every second. The expected behavior is divided into three phases.
Phase 1
The server’s receive buffer is not full, so despite not calling recv(), it still acknowledges the client’s packets.
Phase 2
The server’s receive buffer becomes full, it advertises a zero window, and the client’s data starts to accumulate in its send buffer.
Phase 3
The client’s send buffer fills up and the user process blocks on send().
In practice the phenomenon matches the expectation, but monitoring the TCP connection with ss -nt shows the Send‑Q growing from 0 to 14480, far exceeding the configured SO_SNDBUF of 4096.
Send‑Q represents the total length from the left edge of the sliding window to all unsent packets.
Double SO_SNDBUF
When a user sets the socket’s send buffer via SO_SNDBUF, the kernel records the value in sk->sk_sndbuf as val * 2. Thus a user‑set 4096 bytes becomes 8192 bytes internally.
case SO_SNDBUF:
sk->sk_sndbuf = max(val * 2, SOCK_MIN_SNDBUF);The kernel doubles the value to account for overhead such as sk_buff structures, skb_shared_info, and L2/L3/L4 headers, ensuring that even with half the memory used for overhead the user data still fits.
However, even an 8192‑byte buffer cannot explain a Send‑Q of 14480 bytes.
sk_wmem_queued
The kernel tracks the current memory used by the send buffer in sk->wmem_queued, which includes both user data and overhead, so it is always larger than Send‑Q.
bool sk_stream_memory_free(const struct sock *sk) {
if (sk->sk_wmem_queued >= sk->sk_sndbuf)
return false;
...
}When a packet is queued, sk_wmem_queued increases by skb->truesize; when the packet is ACKed, it decreases by the same amount.
tcp_sendmsg
The kernel decides whether to create a new sk_buff or append data to the last one based on the write queue state.
int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) {
mss_now = tcp_send_mss(sk, &size_goal, flags);
while (msg_data_left(msg)) {
int copy = 0;
int max = size_goal;
skb = tcp_write_queue_tail(sk);
if (tcp_send_head(sk)) {
copy = max - skb->len;
}
if (copy <= 0) {
/* allocate new skb */
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
skb = sk_stream_alloc_skb(...);
}
...
}
}In Phase 1 each segment creates a new sk_buff. In Phase 2 the client already has accumulated sk_buff s, so the kernel tries to append data to the last one. The amount that can be appended is calculated as copy = size_goal - skb->len.
The size_goal is derived from the MSS and whether Generic Segmentation Offload (GSO) is enabled:
GSO enabled: size_goal = tp->gso_segs * mss_now GSO disabled: size_goal = mss_now In the test environment the effective MSS is 1448 bytes, and size_goal becomes 14480 bytes (10 × MSS). Consequently, when the client enters Phase 2, copy = 14480 - 1024 = 13456 bytes.
Although the first sk_buff has skb->len = 1024 and skb->truesize = 4372, the kernel can expand the buffer via sk_wmem_schedule before copying data:
if (!sk_wmem_schedule(sk, copy))
goto wait_for_memory; sk_wmem_scheduleultimately calls __sk_mem_schedule, which can increase sk->sk_forward_alloc and allow sk_wmem_queued to exceed sk_sndbuf.
This mechanism lets the Send‑Q grow beyond the user‑set SO_SNDBUF, making the setting effectively meaningless.
Possible fixes
Disable GSO on the network interface.
Modify the kernel code to move the send‑buffer‑limit check to the beginning of the while loop in tcp_sendmsg.
These changes prevent the kernel from allowing the send buffer to expand beyond the configured limit.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
