Fundamentals 9 min read

Why Does TCP Send‑Q Grow Beyond SO_SNDBUF? Inside Linux Kernel Buffer Mechanics

This article explains why a TCP connection's Send‑Q can exceed the user‑set SO_SNDBUF value, detailing the kernel's double‑buffer trick, sk_wmem_queued accounting, tcp_sendmsg behavior, GSO influence, and possible ways to limit the buffer growth.

Open Source Linux

Sep 6, 2024

Why Does TCP Send‑Q Grow Beyond SO_SNDBUF? Inside Linux Kernel Buffer Mechanics

Problem Overview

A client creates a TCP socket with SO_SNDBUF set to 4096 bytes and sends a 1024‑byte segment every second while the server never calls recv(). Expected behavior is divided into three phases: initial ACKs, zero‑window notification, and client send‑buffer blockage.

Observed Anomaly

Monitoring with ss -nt shows the Send‑Q growing from 0 to 14480, far exceeding the configured SO_SNDBUF of 4096.

Why SO_SNDBUF Is Doubled

When the user sets SO_SNDBUF, the kernel stores sk->sk_sndbuf = max(val*2, SOCK_MIN_SNDBUF), effectively doubling the value to account for internal overhead such as sk_buff structures and protocol headers.

sk_wmem_queued

The kernel tracks the actual memory used by the send buffer in sk->sk_wmem_queued, which includes both user data and overhead, so it is typically larger than the visible Send‑Q.

tcp_sendmsg Logic

During tcp_sendmsg, the kernel decides whether to allocate a new sk_buff (Case 1) or append data to the last sk_buff (Case 2). The decision depends on the write queue state and the calculated size_goal.

int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) {
    mss_now = tcp_send_mss(sk, &size_goal, flags);
    while (msg_data_left(msg)) {
        int copy = 0;
        int max = size_goal;
        skb = tcp_write_queue_tail(sk);
        if (tcp_send_head(sk)) {
            copy = max - skb->len;
        }
        if (copy <= 0) {
            if (!sk_stream_memory_free(sk))
                goto wait_for_sndbuf;
            skb = sk_stream_alloc_skb(sk, select_size(sk, sg), sk->sk_allocation, skb_queue_empty(&sk->sk_write_queue));
        }
        if (!sk_wmem_schedule(sk, copy))
            goto wait_for_memory;
        err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, pfrag->page, pfrag->offset, copy);
    }
}

size_goal Calculation

size_goal

is derived from tcp_xmit_size_goal and depends on whether Generic Segmentation Offload (GSO) is enabled: size_goal = tp->gso_segs * mss_now when GSO is on, otherwise it equals mss_now. In the author's environment, mss_now is 1448 bytes, leading to a size_goal of 14480 bytes (10 × mss_now).

Why Send‑Q Can Exceed SO_SNDBUF

The kernel can increase sk->sk_wmem_queued beyond sk->sk_sndbuf via sk_wmem_schedule, which expands the sk_buff allocation when enough system memory is available, effectively allowing the observed Send‑Q to grow past the user‑set limit.

Possible Mitigations

Disable GSO on the network interface.

Modify the kernel code to move the send‑buffer limit check to the beginning of the while loop in tcp_sendmsg, preventing the buffer from being over‑allocated.

These changes would make the SO_SNDBUF setting more effective.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TCP networking Linux kernel socket buffer GSO SO_SNDBUF tcp_sendmsg

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.