Understanding and Tuning Linux TCP Queue and Buffer Parameters
This article explains the Linux TCP connection‑establishment, packet‑receive, and packet‑send paths, categorizes related kernel parameters such as backlog, SYN‑cookie, ring buffers, and socket buffers, and provides practical commands and guidelines for optimizing network performance on servers.
When optimizing network application performance on Linux, adjusting TCP‑related kernel parameters—especially those concerning queues and buffers—is essential. This article summarizes these parameters from the perspective of the protocol stack to aid understanding and memory.
1. Connection Establishment
The client sends a SYN packet, the server replies with SYN‑ACK and places the connection in the half‑open queue (length controlled by net.ipv4.tcp_max_syn_backlog ). After the client ACKs, the connection moves to the accept queue, whose length is min(net.core.somaxconn, backlog) , where backlog is the value passed to listen() via #include int listen(int sockfd, int backlog); . If backlog exceeds net.core.somaxconn , the accept queue is capped at net.core.somaxconn . SYN‑flood attacks are mitigated by enabling SYN‑cookies ( net.ipv4.tcp_syncookies=1 ).
2. Packet Reception
Incoming packets travel from the NIC driver through the kernel to user space. The NIC uses DMA to place packet descriptors into a fixed‑size RingBuffer; when full, new packets are dropped. After an interrupt, the kernel moves packets to the IP layer, then to the TCP layer, finally into the receive buffer awaiting application read.
Key parameters include:
Bonding mode : combines multiple NICs into a virtual interface; configured via /proc/net/bonding/bond0 .
Multi‑queue and IRQ affinity : enable multiple NIC queues and bind each to a CPU core; check with lspci -vvv , cat /proc/interrupts , and set affinity via commands such as echo "1" > /proc/irq/99/smp_affinity .
RingBuffer : FIFO queue between NIC and IP layer; view size with ethtool -g eth0 . Metrics like RX errors, RX dropped, and RX overruns indicate buffer pressure.
Input packet queue : length set by net.core.netdev_max_backlog .
recvBuffer : critical for TCP throughput; should be larger than the Bandwidth‑Delay Product (BDP). Automatic tuning is enabled when net.ipv4.tcp_moderate_rcvbuf=1 , using the three‑value array net.ipv4.tcp_rmem . When disabled, defaults come from net.core.rmem_default or can be overridden with setsockopt(SO_RCVBUF) . Additional overhead is accounted for by net.ipv4.tcp_adv_win_scale .
3. Packet Transmission
Outgoing packets flow from the application’s send buffer through the kernel to the NIC. The send buffer is automatically tuned between the limits defined by net.ipv4.tcp_wmem (or net.core.wmem_default / net.core.wmem_max ). If SO_SNDBUF is set, automatic tuning is disabled.
After the IP layer, packets enter the queueing discipline (QDisc) whose length is controlled by txqueuelen (viewable via ifconfig ). The QDisc then places packet descriptors into the RingBuffer’s transmit queue. Offloading features such as TCP segmentation offload (TSO) and checksum offload can be inspected and toggled with ethtool -k eth0 (e.g., sudo ethtool -K eth0 tso off ).
By understanding these queues and their associated kernel parameters, administrators can better tune Linux for high‑throughput, low‑latency network workloads.
References : Queueing in the Linux Network Stack; TCP Implementation in Linux: A Brief Tutorial; Impact of Bandwidth‑Delay Product on TCP Throughput; Java programmers’ guide to NICs; NIC interrupt handling.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.