How Linux Sends Network Packets: From send() to the NIC Explained

This article walks through the complete Linux kernel path for sending a network packet, covering the send() system call, TCP processing, IP routing, queueing, driver interaction, DMA mapping, and the role of hard and soft interrupts, while answering common performance questions.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux Sends Network Packets: From send() to the NIC Explained

The article begins by revisiting a minimal server program that calls send() and promises a deep dive into how the Linux kernel transforms that call into an on‑wire packet, using a Linux 3.10 kernel and an Intel igb NIC as concrete examples.

1. Overview of the Linux Send Path

A high‑level flowchart shows user data copied into kernel space, processed by the TCP/IP stack, placed into a RingBuffer, and finally handed to the NIC. A second diagram adds the source‑code perspective, highlighting where memory is freed after transmission.

2. NIC Initialization and RingBuffer Allocation

Modern NICs support multiple transmit queues, each represented by a RingBuffer. During driver open ( __igb_open()), the driver allocates both igb_tx_buffer (kernel‑side bookkeeping) and e1000_adv_tx_desc (hardware‑visible descriptors) via vzalloc() and dma_alloc_coherent(). The RingBuffer is then linked to the device and all queues are started with netif_tx_start_all_queues().

3. Socket Creation (accept)

Before sending, a connected socket is created by accept(). The article shows the kernel structures that bind the new socket to the process’s file descriptor table, but skips the detailed source code as it is not central to the send path.

4. The Send Path in Detail

4.1 send() System Call

SYSCALL_DEFINE4(send, ...)

simply forwards to sys_sendto(). The core work happens in sock_sendmsg(), which looks up the socket object, builds a struct msghdr with the user buffer, and calls the protocol‑specific sendmsg (for IPv4 TCP this is tcp_sendmsg()).

4.2 TCP Layer

tcp_sendmsg()

allocates an skb, copies user data into it, and enqueues it on the socket’s write queue ( tcp_write_queue_tail()). It decides whether to push the data immediately (based on forced_push() or tcp_send_head()) or just buffer it.

4.3 Transmission to the NIC

If sending is triggered, tcp_write_xmit() performs congestion‑window checks, possible segmentation ( tso_fragment()), and finally calls tcp_transmit_skb(). This function clones the skb (so the original can be kept for retransmission), fills the TCP header, and hands the packet to the IP layer via icsk->icsk_af_ops->queue_xmit(), which resolves to ip_queue_xmit().

4.4 IP Layer

ip_queue_xmit()

looks up or caches a route, sets the IP header fields (protocol, TTL, fragment offset), and calls ip_local_out(). After netfilter processing, dst_output() invokes the route’s output method, which is ip_output(). This function may fragment the packet if it exceeds the MTU, then calls ip_finish_output() and ultimately ip_finish_output2() to hand the packet to the neighbour subsystem.

4.5 Neighbour Subsystem

The neighbour code resolves the next‑hop MAC address (ARP for IPv4). If the entry is missing, __neigh_create() creates one and may send an ARP request. Once the MAC address is known, dev_hard_header() builds the Ethernet header and dev_queue_xmit() queues the packet for the device.

4.6 Network Device Queueing

The packet is placed on a per‑device transmit queue selected by netdev_pick_tx() (using XPS or a hash). The queue’s qdisc may bypass scheduling or enqueue the skb. The scheduler’s __qdisc_run() eventually calls sch_direct_xmit(), which invokes the driver’s ndo_start_xmit() (for igb this is igb_xmit_frame()).

4.7 igb Driver Transmission

The driver maps the skb data to DMA addresses with dma_map_single(), fills the hardware descriptors ( e1000_adv_tx_desc), and writes the tail pointer. A memory barrier ( wmb()) ensures all writes are visible before the NIC fetches the descriptors.

4.8 Completion and Interrupts

When the NIC finishes transmitting, it raises a hardware interrupt. The driver’s interrupt handler schedules a soft interrupt ( NET_RX_SOFTIRQ) via ____napi_schedule(). The NAPI poll function igb_poll() calls igb_clean_tx_irq(), which frees the original skb, unmaps DMA, and clears the RingBuffer entry. Because the same soft‑IRQ type is used for both receive and transmit completions, /proc/softirqs typically shows NET_RX far larger than NET_TX.

5. Answering Common Questions

CPU accounting for send() : Most work happens in the process’s kernel mode (shown as sy), with only occasional soft‑IRQ work ( si) when the transmit quota is exhausted.

Why NET_RX > NET_TX in /proc/softirqs : Receive processing always uses NET_RX_SOFTIRQ. Transmit completion also uses the same soft‑IRQ, and the bulk of send work occurs in process context, not soft‑IRQ.

Memory copies involved in sending : (1) User buffer → skb data area; (2) skb clone for the NIC path; (3) Optional fragmentation copy when the packet exceeds MTU.

The article concludes that understanding each layer—from the system call down to the NIC driver—provides concrete places to look when optimizing network performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceTCPInterruptsdriver
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.