Why Does TCP Send‑Q Exceed SO_SNDBUF? Inside Linux Kernel Buffer Mechanics
A Linux client sets SO_SNDBUF to 4096 bytes and sends 1 KB packets while the server never receives them, yet the TCP Send‑Q grows to 14480 bytes, prompting an in‑depth analysis of kernel buffer accounting, double‑sized send buffers, and tcp_sendmsg behavior.
We encountered a scenario where a client creates a TCP socket, sets the send buffer size (SO_SNDBUF) to 4096 bytes, and sends a 1024‑byte segment every second while the server never calls recv(). The expected behavior is described in three phases, but monitoring with ss -nt shows the Send‑Q growing from 0 to 14480, far exceeding the configured buffer.
What Send‑Q Represents
Send‑Q is the total length from the left edge of the sliding window to all unsent packets, i.e., the amount of data pending transmission.
Double SO_SNDBUF
When the user sets SO_SNDBUF, the kernel records val * 2 in sk->sk_sndbuf. Thus a user‑specified 4096 bytes becomes 8192 bytes internally, allowing room for additional overhead such as sk_buff structures and protocol headers.
sk_wmem_queued
The kernel tracks the current memory used by the send buffer in sk->wmem_queued, which includes both user data and extra overhead, so it is typically larger than the raw Send‑Q value.
bool sk_stream_memory_free(const struct sock *sk) {
if (sk->sk_wmem_queued >= sk->sk_sndbuf)
return false;
...
}tcp_sendmsg
The function decides whether to create a new sk_buff or append data to the last one based on the write queue state.
int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) {
mss_now = tcp_send_mss(sk, &size_goal, flags);
while (msg_data_left(msg)) {
int copy = 0;
int max = size_goal;
skb = tcp_write_queue_tail(sk);
if (tcp_send_head(sk)) {
copy = max - skb->len;
}
if (copy <= 0) {
/* allocate new skb */
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
skb = sk_stream_alloc_skb(sk, select_size(sk, sg),
sk->sk_allocation,
skb_queue_empty(&sk->sk_write_queue));
}
...
}
}Case 1: Creating a New sk_buff
During Phase 1 the client creates a new sk_buff for each packet; the kernel checks the send‑buffer limit before allocation, which passes because the limit is not yet reached.
Case 2: Appending to the Last sk_buff
In Phase 2 the client’s send buffer already holds accumulated sk_buff s, so the kernel attempts to append data to the last one. The amount that can be appended is copy = size_goal - skb->len, where size_goal depends on the MSS and GSO settings.
When GSO is enabled, size_goal = tp->gso_segs * mss_now; otherwise it equals mss_now. In the observed environment the effective MSS is 1448 bytes, yielding a size_goal of 14480 bytes (10 × MSS).
Thus, when the client enters Phase 2, tcp_sendmsg computes copy = 14480 - 1024 = 13456 bytes. However, the existing sk_buff has len = 1024 and truesize = 4372, insufficient to hold the full 14480 bytes.
Before copying data, the kernel calls sk_wmem_schedule, which can expand the sk_buff and increase sk->sk_wmem_queued, allowing the send‑queue memory to exceed sk->sk_sndbuf.
This behavior effectively makes the user‑set SO_SNDBUF less restrictive, prompting suggestions such as disabling GSO on the NIC or moving the send‑buffer check earlier in the tcp_sendmsg loop.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
