Operations 14 min read

Mastering Linux TCP Performance: Tuning Queues, Buffers, and Offloading

This guide explains how to optimize Linux TCP performance by classifying and tuning kernel parameters related to connection establishment, packet reception, and packet transmission, covering queues, buffers, NIC bonding, multi‑queue IRQ affinity, RingBuffer settings, and offloading features.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Mastering Linux TCP Performance: Tuning Queues, Buffers, and Offloading

1. Connection Establishment

When a client initiates a TCP connection, the server places the half‑open socket in the SYN‑RECV queue whose length is controlled by net.ipv4.tcp_max_syn_backlog. After the three‑way handshake completes, the socket moves to the accept queue whose size is min(net.core.somaxconn, backlog), where backlog is the value passed to listen(). To mitigate SYN‑flood attacks, Linux can enable SYN cookies via net.ipv4.tcp_syncookies=1, which encodes connection state in the initial sequence number instead of storing it in the half‑open queue.

2. Packet Reception

Incoming packets travel from the NIC driver through kernel space to user space. The NIC uses DMA to place packet descriptors into a fixed‑size RingBuffer; when the RingBuffer is full, new packets are dropped. An interrupt notifies the kernel, which passes the packet to the IP layer, then to TCP, and finally to the receive buffer where the application reads it. The TCP ACK is sent as soon as the packet reaches the receive buffer, but the application may still need its own acknowledgment.

NIC Bonding Mode : Multiple physical NICs can be bonded into a single virtual interface. Linux supports seven bonding modes (balance‑rr, active‑backup, balance‑xor, broadcast, 802.3ad, balance‑tlb, balance‑alb). The current mode can be inspected with cat /proc/net/bonding/bond0.

NIC Multi‑Queue and IRQ Affinity : Modern NICs expose multiple queues. Use lspci -vv to check for MSI‑X and a queue count > 1. Verify that multi‑queue is enabled via cat /proc/interrupts (look for entries like eth0‑TxRx‑0). Bind each queue’s IRQ to a specific CPU core by writing a hexadecimal CPU mask to /proc/irq/IRQ_NUM/smp_affinity, e.g. echo "1" > /proc/irq/99/smp_affinity.

RingBuffer : The RingBuffer sits between the NIC and the IP layer. Its size can be queried with ethtool -g eth0. Typical defaults are a receive queue of 4096 descriptors and a transmit queue of 256.

Input Packet Queue : When the packet arrival rate exceeds the kernel’s processing rate, packets are buffered in the input queue whose length is set by net.core.netdev_max_backlog.

recvBuffer : This buffer determines TCP receive performance. The optimal size should be at least the Bandwidth‑Delay Product (BDP). BDP = bandwidth × RTT (e.g., 100 Mbps × 100 ms = 1.25 MiB). Linux automatically adjusts the receive buffer between the values specified in net.ipv4.tcp_rmem when net.ipv4.tcp_moderate_rcvbuf=1. If automatic tuning is disabled, the default is taken from net.core.rmem_default or overridden by net.ipv4.tcp_rmem. The extra overhead for socket structures is accounted for by net.ipv4.tcp_adv_win_scale (1 → ½ of the buffer used for overhead, 2 → ¼).

3. Packet Transmission

Outgoing packets travel from the application’s send buffer through the kernel to the NIC. The application writes data to the TCP send buffer; the kernel builds sk_buff structures, places them in the IP layer’s QDisc, then into the NIC’s RingBuffer for DMA transmission.

sendBuffer : Controlled by net.ipv4.tcp_wmem and the defaults net.core.wmem_default / net.core.wmem_max. The buffer is auto‑tuned between the min and max values in tcp_wmem. Setting the socket option SO_SNDBUF disables auto‑tuning and forces the limit to net.core.wmem_max.

QDisc : The queueing discipline sits between the IP layer and the NIC’s RingBuffer, providing traffic shaping, classification, and prioritization. Its queue length is set by txqueuelen, which can be viewed and modified with ifconfig eth0 txqueuelen 2000.

RingBuffer (TX queue) : The transmit side of the RingBuffer holds descriptors for packets ready to be sent. Its size is also visible via ethtool -g eth0 and can be adjusted similarly to the receive side.

TCP Segmentation and Checksum Offloading : Modern NICs can offload TCP segmentation (TSO) and checksum calculation, reducing CPU load. For example, a 7300‑byte payload with an MTU of 1500 bytes is split into five 1460‑byte segments. Offloading status can be checked with ethtool -k eth0, and features can be toggled, e.g., sudo ethtool -K eth0 tso off.

By understanding each layer’s queues and the associated kernel parameters, administrators can systematically tune Linux networking for higher throughput and lower latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningnetwork optimizationKernel ParametersqueuesBuffers
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.