Why Soft Interrupts Turn Into Hidden CPU Load in High‑Traffic Linux Servers
The article explains how Linux soft interrupts work, why they can become a CPU bottleneck under heavy network traffic, and provides detailed analysis, code examples, monitoring tools, and optimization strategies to diagnose and mitigate the issue.
Linux Soft Interrupts Overview
Linux uses two interrupt layers: hard interrupts generated directly by hardware (e.g., NIC packet arrival) and soft interrupts, a software‑level buffering layer that defers non‑urgent work. Soft interrupts are scheduled after the hard interrupt finishes, either directly or via the ksoftirqd kernel thread.
Implementation
Soft interrupt types are defined in include/linux/interrupt.h (e.g., HI_SOFTIRQ, TIMER_SOFTIRQ, NET_TX_SOFTIRQ, NET_RX_SOFTIRQ). The kernel keeps a global array softirq_vec of struct softirq_action where each entry stores a handler function and optional data. Registration is performed with
open_softirq(int nr, void (*action)(struct softirq_action *), void *data):
struct softirq_action {
void (*action)(struct softirq_action *);
void *data;
};
static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;To trigger a soft interrupt, code calls raise_softirq(unsigned int nr), which sets the corresponding bit in the per‑CPU pending bitmap __softirq_pending. The pending bits are later processed either immediately after a hard interrupt or by the ksoftirqd thread.
Processing Flow
Abort if already in interrupt context to avoid nesting.
Read the per‑CPU pending bitmap.
Iterate over set bits using ffs, clear each bit, and invoke the registered handler from softirq_vec.
Update statistics and exit.
The ksoftirqd thread continuously polls the bitmap and calls do_softirq() when work is pending, ensuring that soft‑interrupt handling does not block normal kernel execution.
Impact of High Network Load
When many small packets arrive, hard interrupts fire frequently, each scheduling NET_RX_SOFTIRQ. If the packet rate exceeds the kernel’s processing capacity, the soft‑interrupt backlog grows, causing a large fraction of CPU time to be spent in soft‑interrupt handlers. This appears as high si percentages in top, CPU saturation, increased latency, and packet loss.
Monitoring Tools
/proc/softirqs– per‑CPU counts for each soft‑interrupt type. top – shows the si column (soft‑interrupt CPU usage). perf top -e softirq and perf record -g -e irq:softirq_entry -a sleep 10 – fine‑grained profiling of soft‑interrupt handlers.
Example C++ Monitoring Utility
The article provides a C++ program that reads /proc/softirqs, aggregates NET_RX and NET_TX counts per CPU, warns if a single CPU exceeds 60% of the total, and prints the current si usage via top. It also suggests running the perf commands for deeper analysis.
Optimization Strategies
Multi‑queue NICs & RPS/RFS – Distribute packets across multiple hardware queues and steer flows to specific CPUs.
Interrupt Coalescing – Batch packets to reduce interrupt frequency.
Interrupt Affinity – Bind specific soft‑interrupt types to dedicated CPUs via /proc/irq/*/smp_affinity or the irqbalance service.
Kernel Parameters – Tune net.core.netdev_budget (packets per soft‑interrupt) and net.core.netdev_max_backlog (receive queue length) to balance latency and throughput.
Driver & Kernel Updates – Use newer NIC drivers and kernel versions that improve soft‑interrupt handling.
By monitoring soft‑interrupt statistics, profiling handler execution, and applying the above tuning steps, administrators can identify and mitigate hidden CPU load caused by soft interrupts in high‑traffic Linux environments.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
