Operations 12 min read

How Does a Linux Packet Travel from NIC to Your App? Unveiling Zero‑Copy Secrets

This article walks through every stage a network packet undergoes inside the Linux kernel—from hardware interrupt and driver processing, through the sk_buff structures of the TCP/IP stack, to user‑space delivery—while explaining zero‑copy mechanisms like sendfile, splice, mmap and io_uring, and offering concrete tuning commands for optimal performance.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Does a Linux Packet Travel from NIC to Your App? Unveiling Zero‑Copy Secrets

Introduction

As a systems or operations engineer you constantly face network‑related performance problems, yet few understand the exact path a packet takes from the network card to a user‑space application. This guide dissects the Linux network stack, explains why zero‑copy can boost throughput by dozens of times, and provides practical tuning steps.

1. The Packet’s Journey Through the Kernel

1.1 NIC Reception – the first hardware interrupt

When a frame arrives, the NIC writes it into the receive ring buffer (RX Ring) and raises a hard IRQ.

# 查看网卡中断情况
cat /proc/interrupts | grep eth0

# 查看网卡队列统计
ethtool -S eth0 | head -20

1.2 Driver handling

The NIC driver processes the interrupt in a fast, lock‑free context, moving the packet to the kernel’s software queue.

1.3 Soft‑IRQ (NET_RX_SOFTIRQ)

The driver schedules the NET_RX_SOFTIRQ which hands the packet to the network stack.

1.4 Kernel network stack processing

netif_rx() → NET_RX_SOFTIRQ → __netif_receive_skb() → protocol handling → socket buffer

1.5 sk_buff structure

The core data container is struct sk_buff, which holds pointers to the packet data, metadata, and linked list links.

struct sk_buff {
    struct sk_buff *next;
    struct sk_buff *prev;
    struct net_device *dev;
    unsigned char *head;
    unsigned char *data;
    unsigned char *tail;
    unsigned char *end;
    // ... more fields
};

2. TCP/IP Stack Implementation in Linux

2.1 Layered processing

Linux follows the OSI model: physical → data‑link (L2) → network (L3) → transport (L4) → socket.

# 查看以太网帧处理统计
cat /proc/net/dev

2.2 IP layer

int ip_rcv(struct sk_buff *skb, struct net_device *dev,
            struct packet_type *pt, struct net_device *orig_dev) {
    // IP header validation
    if (!pskb_may_pull(skb, sizeof(struct iphdr)))
        goto inhdr_error;
    // Routing lookup
    if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev))
        goto drop;
    // Pass to transport layer
    return dst_input(skb);
}

2.3 TCP state machine

enum tcp_states {
    TCP_ESTABLISHED = 1,
    TCP_SYN_SENT,
    TCP_SYN_RECV,
    TCP_FIN_WAIT1,
    TCP_FIN_WAIT2,
    TCP_TIME_WAIT,
    TCP_CLOSE,
    // ... more states
};

2.4 TCP tuning parameters

# TCP window tuning
echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 87380 134217728' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_congestion_control = bbr' >> /etc/sysctl.conf

3. Zero‑Copy Techniques

3.1 Traditional I/O bottlenecks

Typical data flow involves multiple copies and context switches:

Disk → kernel buffer → user buffer → socket buffer → NIC

This results in four copies, four user↔kernel switches, and high CPU usage.

3.2 sendfile()

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

int send_file_zero_copy(int socket_fd, int file_fd, size_t file_size) {
    off_t offset = 0;
    ssize_t sent;
    while (offset < file_size) {
        sent = sendfile(socket_fd, file_fd, &offset, file_size - offset);
        if (sent <= 0) {
            if (errno == EAGAIN) continue;
            return -1;
        }
        offset += sent;
    }
    return 0;
}

3.3 splice() and tee()

ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
               loff_t *off_out, size_t len, unsigned int flags);
ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);

3.4 mmap() based zero‑copy

void *mapped = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
ssize_t sent = send(socket_fd, mapped, st.st_size, 0);
munmap(mapped, st.st_size);

3.5 Modern async I/O – io_uring

#include <liburing.h>
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;

io_uring_queue_init(QUEUE_DEPTH, &ring, 0);
sqe = io_uring_get_sqe(&ring);
io_uring_prep_sendfile(sqe, socket_fd, file_fd, 0, file_size);
io_uring_submit(&ring);
io_uring_wait_cqe(&ring, &cqe);

4. Performance Monitoring & Tuning

4.1 Real‑time interrupt and soft‑irq stats

# Network interrupt distribution
cat /proc/interrupts | grep -E "(CPU|eth)"

# Soft‑irq statistics
watch -n 1 'cat /proc/softirqs | head -2 && cat /proc/softirqs | grep NET'

4.2 Queue depth and RPS

for i in /sys/class/net/*/queues/rx-*/rps_cpus; do
    echo "$i: $(cat $i)"
done

4.3 eBPF tracing example (TCP send latency)

SEC("kprobe/tcp_sendmsg")
int trace_tcp_sendmsg(struct pt_regs *ctx) {
    u64 ts = bpf_ktime_get_ns();
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    bpf_map_update_elem(&timestamps, &pid, &ts, BPF_ANY);
    return 0;
}

4.4 Production checklist

Enable multi‑queue NIC (RSS/RPS).

Bind interrupts to specific CPUs.

Tune kernel network parameters (net.core.*, net.ipv4.tcp_*).

Adopt zero‑copy system calls (sendfile, splice, io_uring).

Continuously monitor latency, drops, and retransmissions.

5. Real‑World Impact

5.1 Web server benchmark

Using a 1 GB file:

Traditional read/write copy: ~2.3 s, CPU ≈ 85 %.

Zero‑copy (sendfile): ~0.8 s, CPU ≈ 12 %.

5.2 Typical zero‑copy use‑cases

Static file serving (Nginx, Caddy).

Reverse proxies (HAProxy, Envoy).

Message brokers (Kafka, Pulsar).

Database file transfer (MySQL, PostgreSQL).

Conclusion

By tracing a packet from the NIC through the Linux kernel’s layered processing, understanding the sk_buff data path, and applying zero‑copy system calls, engineers can dramatically reduce latency and CPU overhead. Combined with modern tools such as eBPF and io_uring, these techniques enable high‑throughput, low‑latency services in production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KernelLinuxZero CopyNetwork Stack
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.