How Does Linux Handle Network Packets? Inside the Kernel’s Interrupt and SoftIRQ Path
This article explains the layered architecture of the Linux network subsystem, the TCP/IP model implementation, how hardware and soft interrupts are registered and processed, the role of ksoftirqd and NAPI, and the complete send/receive packet flow from NIC to protocol stack.
Linux Network Subsystem Layering
Linux network subsystem implementation requires:
Support for different protocol families (INET, INET6, UNIX, NETLINK...)
Support for different network devices
Support for a unified BSD socket API
A layered structure to hide differences between protocols, hardware, and platforms
System calls are the only way for user applications to access the kernel. The protocol‑independent interface is provided by the socket layer, which offers generic functions for various protocols. The network protocol layer supplies concrete protocol interfaces (proto{}) that implement protocol details. Device‑independent interfaces provide generic functions for low‑level network drivers, while each driver defines a net_device structure and initializes it.
TCP/IP Layer Model
In the TCP/IP model the stack is divided into physical, link, network, transport, and application layers. Linux implements the link, network, and transport layers. The link‑layer protocols are handled by NIC drivers, while the kernel implements the network and transport layers and exposes a socket interface to user space.
Responsibilities of each layer:
Link layer : frames data, defines MAC addresses, transmits bits.
Network layer : defines IP addresses, performs routing and MAC resolution.
Transport layer : defines ports, identifies applications, delivers packets.
Application layer : defines data formats and parses them.
When a URL is entered, the application layer formats the request, the transport layer adds ports, the network layer adds IP addresses, and the link layer adds MAC addresses before the packet is transmitted.
Linux Network Protocol Stack
The send/receive path across application, transport, network, and link layers has been illustrated in the diagram below.
The article focuses on the link‑layer processing from NIC interrupt to network‑layer reception.
Interrupt Handling When NIC Receives Packets
An interrupt is a hardware signal that causes the CPU to stop its current work and execute an interrupt handler. Soft interrupts (softirqs) allow time‑consuming work to be deferred to a later context.
Linux splits an interrupt into an upper half (quick response) and a lower half (deferred work). The upper half handles the hardware interrupt and stores the packet in memory; the lower half processes the packet.
Softirqs are one mechanism for the lower half, along with tasklets and work queues. Network receive uses NET_RX_SOFTIRQ.
enum {
HI_SOFTIRQ=0,
TIMER_SOFTIRQ,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
BLOCK_SOFTIRQ,
IRQ_POLL_SOFTIRQ,
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
HRTIMER_SOFTIRQ,
RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
NR_SOFTIRQS
};Registering NIC interrupts involves both the hardware interrupt and the softirq.
Register NIC interrupt
Example using the e1000 driver:
static int __init e1000_init_module(void)
{
int ret;
pr_info("%s - version %s
", e1000_driver_string, e1000_driver_version);
pr_info("%s
", e1000_copyright);
ret = pci_register_driver(&e1000_driver);
...
return ret;
}The driver’s .probe method (e1000_probe) initializes the adapter and sets netdev_ops.
static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
...
netdev->netdev_ops = &e1000_netdev_ops;
e1000_set_ethtool_ops(netdev);
...
}The open routine registers the interrupt:
int e1000_open(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
...
err = e1000_request_irq(adapter);
...
} static int e1000_request_irq(struct e1000_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
irq_handler_t handler = e1000_intr;
int irq_flags = IRQF_SHARED;
int err;
err = request_irq(adapter->pdev->irq, handler, irq_flags, netdev->name, ...);
...
return err;
}Softirq registration during kernel initialization:
void __init softirq_init(void)
{
...
open_softirq(TASKLET_SOFTIRQ, tasklet_action);
open_softirq(HI_SOFTIRQ, tasklet_hi_action);
}NET_TX_SOFTIRQ and NET_RX_SOFTIRQ are registered in net_dev_init:
open_softirq(NET_TX_SOFTIRQ, net_tx_action);
open_softirq(NET_RX_SOFTIRQ, net_rx_action);ksoftirqd kernel threads are created per CPU to handle softirqs.
During initialization kernel/smpboot.c calls smpboot_register_percpu_thread which eventually invokes spawn_ksoftirqd in kernel/softirq.c.
static struct smp_hotplug_thread softirq_threads = {
.store = &ksoftirqd,
.thread_should_run = ksoftirqd_should_run,
.thread_fn = run_ksoftirqd,
.thread_comm = "ksoftirqd/%u",
};ksoftirqd runs a loop checking for pending softirqs and executes __do_softirq.
static void run_ksoftirqd(unsigned int cpu)
{
local_irq_disable();
if (local_softirq_pending()) {
__do_softirq();
rcu_note_context_switch(cpu);
local_irq_enable();
cond_resched();
return;
}
local_irq_enable();
}Network protocol registration (IP, TCP, UDP) is performed in inet_init, which adds handlers to inet_protos and ptype_base.
static struct packet_type ip_packet_type __read_mostly = {
.type = cpu_to_be16(ETH_P_IP),
.func = ip_rcv,
};
static const struct net_protocol udp_protocol = {
.handler = udp_rcv,
.err_handler = udp_err,
.no_policy = 1,
.netns_ok = 1,
};
static const struct net_protocol tcp_protocol = {
.early_demux = tcp_v4_early_demux,
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
.no_policy = 1,
.netns_ok = 1,
};When a packet arrives, the NIC writes it to its FIFO, DMA copies it to a sk_buff, and the hardware interrupt notifies the kernel. The upper half stores the packet; the lower half (softirq) processes it via NAPI or a fallback path.
ksoftirqd processing softirqs
static void net_rx_action(struct softirq_action *h)
{
struct softnet_data *sd = &__get_cpu_var(softnet_data);
unsigned long time_limit = jiffies + 2;
int budget = netdev_budget;
local_irq_disable();
while (!list_empty(&sd->poll_list)) {
...
work = n->poll(n, weight);
budget -= work;
}
}For e1000, NAPI polling calls igb_poll, which in turn invokes igb_clean_rx_irq to retrieve packets from the ring buffer, fill skb fields, and pass them to napi_gro_receive.
static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
{
do {
skb = igb_fetch_rx_buffer(rx_ring, rx_desc, skb);
if (igb_is_non_eop(rx_ring, rx_desc))
continue;
if (igb_cleanup_headers(rx_ring, rx_desc, skb)) {
skb = NULL;
continue;
}
igb_process_skb_fields(rx_ring, rx_desc, skb);
napi_gro_receive(&q_vector->napi, skb);
} while (...);
}napi_gro_receive ultimately calls netif_receive_skb, delivering the packet to the protocol stack.
gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
{
skb_gro_reset_offset(skb);
return napi_skb_finish(dev_gro_receive(napi, skb), skb);
}Finally, netif_receive_skb hands the packet to the appropriate IP/TCP/UDP handler.
Summary
Send Path
Driver creates a TX descriptor ring and programs the NIC’s TDBA register.
dev_queue_xmit() passes the sk_buff to the driver.
Driver places the sk_buff in the TX ring and updates TDT.
DMA sees the updated TDT, fetches the descriptor, and copies data to the TX FIFO.
MAC transmits the packet.
After transmission, the NIC updates TDH and raises a hardware interrupt to free resources.
Receive Path
Driver creates an RX descriptor ring and programs the NIC’s RDBA register.
Driver allocates sk_buff buffers and maps them for DMA.
NIC receives a packet into its RX FIFO.
DMA copies the packet from the FIFO into the sk_buff buffer.
NIC raises a hardware interrupt; the interrupt handler schedules a softirq.
ksoftirqd runs net_rx_action, which invokes the driver’s poll function (e.g., igb_poll).
The driver processes the packet and calls netif_receive_skb to hand it to the protocol stack.
Source: https://www.cnblogs.com/ypholic/p/14337328.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
