Fundamentals 33 min read

Inside Linux: How the Kernel Handles Network I/O from Packets to Applications

This article explores the Linux kernel’s network stack architecture, detailing each layer from the application’s socket interface through transport, network, and link layers, and explains the complete inbound and outbound data flow with code examples illustrating packet reception, processing, routing, and transmission.

Deepin Linux
Deepin Linux
Deepin Linux
Inside Linux: How the Kernel Handles Network I/O from Packets to Applications

In the digital age, networks are deeply integrated into daily life and work, and the Linux kernel, known for its open‑source, high‑efficiency, and stability, plays a key role in managing network I/O across servers, embedded devices, and supercomputers.

Over 90% of the world’s supercomputers run Linux, and major internet companies rely on its powerful networking capabilities. This article delves into the Linux kernel’s source code to explain how it processes network data from reception and protocol parsing to transmission.

Part1 Linux Network Stack Architecture Overview

The Linux network stack uses a layered architecture similar to a skyscraper, with each layer having distinct responsibilities. From bottom to top, the layers are the link layer, network layer, transport layer, and application layer, working together to achieve efficient data transmission.

1.1 Application Layer: User Interaction Interface

The application layer provides the interface for users and network applications, handling protocols such as HTTP, SMTP, and FTP. Applications use the Linux socket API to communicate with the kernel’s network stack.

(1) Socket

Application‑level programs interact with the kernel via the Linux Socket API, which originated from BSD sockets. Sockets sit above the transport layer and abstract away protocol differences, appearing as regular files in the Linux filesystem.

Sockets hide differences between network protocols.

Sockets are the entry point for network programming, offering many system calls.

In Linux, sockets are part of the file system, making network I/O as convenient as file I/O.

(2) Application Layer Processing Flow

Applications call socket() to create a socket, which invokes the kernel’s sock_create() and returns a file descriptor. Each socket has corresponding struct socket and struct sock with rx, tx, and err queues.

For TCP sockets, connect() initiates a three‑way handshake, establishing a virtual connection and negotiating MSS. UDP skips this step.

Applications use send() or write() to transmit a message. sock_sendmsg() builds a message header and control data.

Depending on the protocol, tcp_sendmsg() or udp_sendmsg() is called to send the packet.

1.2 Transport Layer: Reliable Data Delivery

The transport layer ensures reliable end‑to‑end communication, primarily using TCP (reliable) and UDP (unreliable).

① Connection Establishment (Three‑Way Handshake)

Client sends SYN with initial sequence number x.

Server replies with SYN‑ACK, choosing its own sequence number y.

Client sends ACK, completing the handshake; both sides enter ESTABLISHED.

② Data Transmission

Application data is copied from user space to the kernel’s send buffer.

TCP constructs a segment, adds headers, and computes checksums.

The segment is passed to the IP layer, which adds an IP header, then to the link layer for Ethernet framing and ARP resolution.

Receiver decapsulates, reorders, acknowledges, and performs flow and congestion control.

③ Connection Termination (Four‑Way Handshake)

Client sends FIN, entering FIN‑WAIT‑1.

Server ACKs, entering CLOSE‑WAIT; client moves to FIN‑WAIT‑2.

Server sends its FIN, entering LAST‑ACK.

Client ACKs, enters TIME‑WAIT, then both sides close.

UDP, being connection‑less, skips reliability mechanisms and is used for latency‑sensitive applications like video streaming.

1.3 Network Layer: Packet Routing

The network layer routes packets using the IP protocol, assigning unique IP addresses and determining the best path via routing tables. It also handles ICMP for diagnostics and IGMP for multicast.

Key tasks include routing decisions, adding IP headers, checksum verification, possible fragmentation, and forwarding to the appropriate link layer.

IP Stack Processing Steps

Link layer receives a frame and passes the packet to the network layer. ip_rcv() validates the packet, checks destination, checksum, and invokes netfilter hooks. ip_rcv_finish() performs routing lookup; if the destination is local, ip_local_deliver() is called, otherwise ip_forward() forwards the packet.

Local delivery may involve reassembly of fragmented packets before passing to the transport layer.

1.4 Link Layer: Foundation of Network Communication

The link layer interfaces with physical network devices, encapsulating network‑layer packets into frames (e.g., Ethernet) and handling MAC addressing, error detection, and flow control.

Linux abstracts network devices via the Network Device layer (implemented in net/core/dev.c), with drivers providing hardware‑specific operations.

Part2 Data Input: Journey from NIC to Kernel

2.1 NIC Reception and Interrupt Handling

The NIC receives packets, filters them based on destination MAC, and uses DMA to write data directly into memory, then raises a hardware interrupt to notify the CPU.

2.2 Network Driver Hand‑off

The CPU’s interrupt handler invokes the NIC driver (e.g., e1000_intr()), which disables further NIC interrupts, processes received packets, converts them to struct sk_buff (skb), and schedules a soft interrupt for further processing.

2.3 Initial Network‑Layer Validation

The skb enters ip_rcv(), which checks header length, checksum, and interacts with netfilter’s NF_IP_PRE_ROUTING hook for security filtering.

2.4 Routing Decision and Subsequent Path

Routing cache or ip_route_input_slow() determines the next hop. Packets destined for the local host are delivered locally; others are forwarded.

2.5 Deep Dive into Local Delivery

ip_local_deliver()

handles fragmentation reassembly, passes the packet through NF_IP_LOCAL_IN, and finally hands it to the appropriate transport protocol (TCP/UDP).

Part3 Data Output: Journey from Kernel to NIC

3.1 Transport‑Layer Initiation

Applications invoke send() or sendto(), which passes data to TCP or UDP. TCP adds headers, performs congestion and flow control; UDP simply adds its header.

3.2 IP‑Layer Pre‑Processing

Before leaving the host, packets traverse the NF_IP_LOCAL_OUT netfilter hook for optional filtering or NAT.

3.3 Routing and Protocol Settings

ip_output()

looks up the routing table, selects the egress device, and sets protocol fields. Afterwards, the NF_IP_POST_ROUTING hook can apply NAT or other transformations.

3.4 Fragmentation and Final Transmission

If the packet exceeds the MTU, ip_finish_output() fragments it, assigns identifiers and offsets, and the neighbor subsystem resolves the destination MAC via ARP before the frame is handed to the NIC.

Part4 Case Study and Code Analysis

4.1 Typical Scenario

A web client requests a page from a server: DNS resolves the domain, a TCP three‑way handshake establishes a connection, the client sends an HTTP GET, the server processes the request, sends an HTTP response, and finally the connection is torn down with a four‑way handshake.

4.2 Key Code Snippets

Network‑input example (e1000 driver interrupt handler):

irqreturn_t e1000_intr(int irq, void *data) {
    struct e1000_adapter *adapter = data;
    struct e1000_ring *rx_ring;
    struct sk_buff *skb;
    u32 status;
    int work_done = 0;

    e1000_disable_irq(adapter);
    for (rx_ring = adapter->rx_ring; rx_ring; rx_ring = rx_ring->next) {
        while ((status = e1000_clean_rx_irq(adapter, rx_ring))) {
            skb = e1000_rx_skb(adapter, rx_ring);
            if (skb) {
                netif_receive_skb(skb);
                work_done++;
            }
        }
    }
    napi_schedule(&adapter->napi);
    e1000_enable_irq(adapter);
    return IRQ_RETVAL(work_done);
}

This code disables NIC interrupts, processes received packets, converts them to skb, passes them to the protocol stack, schedules a soft interrupt, and re‑enables interrupts.

Network‑output example (simplified tcp_sendmsg):

int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size) {
    struct tcp_sock *tp = tcp_sk(sk);
    struct sk_buff *skb;
    int err, copied;

    skb = alloc_skb(size + TCP_HEADER_LEN, GFP_KERNEL);
    if (!skb)
        return -ENOMEM;

    skb_reserve(skb, TCP_HEADER_LEN);
    skb->data_len = size;
    skb->len = size + TCP_HEADER_LEN;

    copied = skb_add_data(skb, msg->msg_iov->iov_base, size);
    if (copied < size) {
        kfree_skb(skb);
        return -EFAULT;
    }

    tcp_init_skb(skb, sk);
    err = ip_queue_xmit(skb, &(inet->cork.fl));
    if (err) {
        kfree_skb(skb);
    }
    return err ? err : size;
}

This function allocates an skb, copies user data, prepares TCP headers, and hands the packet to the IP layer for transmission.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KernelLinuxTCP/IPNetworkingSocketNetwork Stackpacket processing
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.