Operations 8 min read

Boost Linux Network Performance with NIC Multi‑Queue and IRQ Affinity

This article explains why a Janus gateway’s QPS stalled under large payloads, how enabling NIC multi‑queue and binding interrupts to specific CPUs (IRQ affinity) resolves the bottleneck, and provides practical commands and a script for Linux network interrupt tuning.

Vipshop Quality Engineering
Vipshop Quality Engineering
Vipshop Quality Engineering
Boost Linux Network Performance with NIC Multi‑Queue and IRQ Affinity

During load testing of the Janus service gateway, the QPS could not increase under large payloads; network throughput was less than half of the dual‑10GbE NIC capacity, CPU usage hovered around 50%, and there were no I/O or JVM GC bottlenecks. The root cause turned out to be that the NIC multi‑queue feature was not enabled on a CentOS 5.5 system running kernel 2.6.18, which lacks multi‑queue support (added in kernel 2.6.21). After upgrading the OS, the QPS rose dramatically.

What Is an Interrupt?

An interrupt is an event that forces the CPU to suspend the currently running program and jump to a handler routine. Interrupts can originate from the program itself (software interrupt) or from external hardware devices (hardware interrupt). After handling the event, the CPU resumes the original program.

Software interrupts are generated by the CPU itself, while hardware interrupts are generated by external I/O devices.

X86 systems use an interrupt mechanism to coordinate the CPU with other devices. Historically, NIC interrupts were handled by CPU0, which could become a hotspot in high‑packet‑rate environments while other CPUs remained idle. NIC multi‑queue technology distributes interrupt handling across multiple CPUs.

Network Soft Interrupts

When a NIC receives a packet, it raises a hardware interrupt to notify the kernel. The kernel’s interrupt handler copies the packet from the NIC’s buffer to memory; because the NIC buffer is limited, this copy must happen quickly to avoid packet loss. The remaining packet processing is delegated to a soft interrupt, which can become a bottleneck on heavily loaded NICs.

You can view interrupt statistics with /proc/interrupts :

<code># cat /proc/interrupts
       CPU0       CPU1       CPU2       CPU3
124: 125935088    0          0          0  IR-PCI-MSI-edge  eth2‑TxRx‑0
125: 0        125840723    0          0  IR-PCI-MSI-edge  eth2‑TxRx‑1
...</code>

In the example, interrupt numbers 124 and 125 correspond to the first two queues of eth2 . The large numbers under CPU0 indicate how many interrupts that CPU has processed. On a system with 24 CPUs, there would be 24 such interrupt lines, each representing a separate NIC queue.

Binding each interrupt to a specific CPU (IRQ affinity) does not automatically distribute the load; however, NICs that support Receive Side Scaling (RSS) map each queue to a distinct interrupt, allowing the kernel to distribute traffic across CPUs.

NIC multi‑queue support is available in Linux kernels after version 2.6.21. Use cat /proc/version to check your kernel version.

Interrupt Binding – IRQ Affinity

IRQ affinity ties one or more interrupt sources to particular CPU cores.

The file /proc/irq/[irq_num]/smp_affinity_list contains a decimal list of CPUs that may handle the interrupt (CPU numbering starts at 0).

<code># Bind interrupt 124 (first eth2 queue) to CPU0
 echo 0 > /proc/irq/124/smp_affinity_list</code>

Using the information from /proc/interrupts , you can script automatic binding of all NIC queues:

<code>#!/bin/sh
# Auto‑bind multi‑queue NIC interrupts
for item in `grep "TxRx" /proc/interrupts | awk '{print $1":"$NF}'`
do
  irq=`echo $item | cut -d":" -f1`
  num=`echo $item | awk -F'-' '{print $NF}'`
  echo "set IRQ[$irq] bonding to CPU[$num]"
  echo $num > /proc/irq/$irq/smp_affinity_list
done
</code>

The user‑space daemon irqbalance can automatically distribute interrupts across CPUs based on historical data; it is recommended to disable it when manually managing IRQ affinity.

How to Monitor Soft‑Interrupt Usage

Run top and press 1 to view per‑CPU statistics. The column %si shows the percentage of CPU time spent handling software interrupts. If a few CPUs show a high %si while others are low, soft‑interrupt imbalance may be a performance bottleneck.

<code>top - 15:37:48 up 54 days, 21:46,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 121 total,   1 running, 120 sleeping,   0 stopped,   0 zombie
Cpu0  : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.3%wa, 0.0%hi, 0.0%si, 0.3%st
Cpu1  : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
...</code>
performance tuningLinuxnetwork performanceMulti-QueueinterruptsIRQ Affinity
Vipshop Quality Engineering
Written by

Vipshop Quality Engineering

Technology exchange and sharing for quality engineering

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.