Why Does TCP Send‑Q Grow Beyond SO_SNDBUF? Inside Linux Kernel Buffer Mechanics
This article explains why a TCP connection's Send‑Q can exceed the user‑set SO_SNDBUF value, detailing the kernel's double‑buffer trick, sk_wmem_queued accounting, tcp_sendmsg behavior, GSO influence, and possible ways to limit the buffer growth.
Problem Overview
A client creates a TCP socket with SO_SNDBUF set to 4096 bytes and sends a 1024‑byte segment every second while the server never calls recv(). Expected behavior is divided into three phases: initial ACKs, zero‑window notification, and client send‑buffer blockage.
Observed Anomaly
Monitoring with ss -nt shows the Send‑Q growing from 0 to 14480, far exceeding the configured SO_SNDBUF of 4096.
Why SO_SNDBUF Is Doubled
When the user sets SO_SNDBUF, the kernel stores sk->sk_sndbuf = max(val*2, SOCK_MIN_SNDBUF), effectively doubling the value to account for internal overhead such as sk_buff structures and protocol headers.
sk_wmem_queued
The kernel tracks the actual memory used by the send buffer in sk->sk_wmem_queued, which includes both user data and overhead, so it is typically larger than the visible Send‑Q.
tcp_sendmsg Logic
During tcp_sendmsg, the kernel decides whether to allocate a new sk_buff (Case 1) or append data to the last sk_buff (Case 2). The decision depends on the write queue state and the calculated size_goal.
int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) {
mss_now = tcp_send_mss(sk, &size_goal, flags);
while (msg_data_left(msg)) {
int copy = 0;
int max = size_goal;
skb = tcp_write_queue_tail(sk);
if (tcp_send_head(sk)) {
copy = max - skb->len;
}
if (copy <= 0) {
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
skb = sk_stream_alloc_skb(sk, select_size(sk, sg), sk->sk_allocation, skb_queue_empty(&sk->sk_write_queue));
}
if (!sk_wmem_schedule(sk, copy))
goto wait_for_memory;
err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, pfrag->page, pfrag->offset, copy);
}
}size_goal Calculation
size_goalis derived from tcp_xmit_size_goal and depends on whether Generic Segmentation Offload (GSO) is enabled: size_goal = tp->gso_segs * mss_now when GSO is on, otherwise it equals mss_now. In the author's environment, mss_now is 1448 bytes, leading to a size_goal of 14480 bytes (10 × mss_now).
Why Send‑Q Can Exceed SO_SNDBUF
The kernel can increase sk->sk_wmem_queued beyond sk->sk_sndbuf via sk_wmem_schedule, which expands the sk_buff allocation when enough system memory is available, effectively allowing the observed Send‑Q to grow past the user‑set limit.
Possible Mitigations
Disable GSO on the network interface.
Modify the kernel code to move the send‑buffer limit check to the beginning of the while loop in tcp_sendmsg, preventing the buffer from being over‑allocated.
These changes would make the SO_SNDBUF setting more effective.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
