Can Linux Reach True Line‑Speed? Deep Dive into Kernel Packet Forwarding Bottlenecks
This article explains why Linux’s native networking stack falls short of line‑speed, analyzes the core latency sources such as DMA, memory copying, routing look‑ups and lock contention, and proposes a comprehensive set of kernel‑level redesigns—including VOQ queues, lock‑free data structures, and user‑space stacks—to achieve near‑wire‑speed forwarding.
Background and Misconceptions
Many people mistakenly equate "line‑speed" with the raw bandwidth of a cable, ignoring the inevitable processing delay inside routers and switches caused by routing table look‑ups, MAC/port mapping, and other core operations that cannot be avoided.
For routers, line‑speed is measured by the ability to output packets at the maximum rate, not by input speed, because input queues can be made arbitrarily large.
Key Bottlenecks in Linux Packet Forwarding
The main performance penalties stem from:
Memory operations: copying packets between NIC buffers and kernel memory, cache misses, and bus contention.
Routing look‑ups: longest‑prefix matching and inefficient route cache.
Locking overhead: SMP locks around routing tables and packet queues.
Interrupt handling: soft‑irq and NAPI scheduling that bind processing to a single CPU.
Proposed Optimizations
1. DMA and Queue Redesign
Introduce a cross‑bar‑style Virtual Output Queue (VOQ) that maps input‑to‑output queues via DMA pointer exchange, eliminating costly packet copies.
/* Simplified VOQ initialization */
struct voq *voqs[MAX_OUTPUT];
for (i = 0; i < MAX_OUTPUT; i++)
voqs[i] = alloc_voq();Bind each VOQ to a specific output NIC and CPU core to preserve locality.
2. Separate Routing and Forwarding Tables
Maintain distinct routing (lookup) and forwarding (output) tables, synchronizing them with RCU to avoid read‑write contention. Use the DxR‑Pro++ three‑step location structure to replace longest‑prefix matching.
3. Lock‑Free Data Paths
Replace global spinlocks with per‑CPU or lock‑free structures, e.g., per‑CPU packet buffers and RCU‑protected routing entries.
4. Interrupt and Soft‑IRQ Scheduling
Distribute RX soft‑irqs across CPUs in a round‑robin fashion, avoiding the “single‑CPU bottleneck”. Example modification to __napi_schedule:
void __napi_schedule(struct napi_struct *n) {
static int cur = 0;
unsigned int cpu = cur++ % NR_CPUS;
local_irq_save(flags);
spin_lock(&rx_handler_lock);
list_add_tail(&n->poll_list, polll[cpu]);
local_softirq_pending(cpu) |= NET_RX_SOFTIRQ;
spin_unlock(&rx_handler_lock);
local_irq_restore(flags);
}5. Flow‑Table Caching
Implement a fast flow table (similar to conntrack or SDN flow tables) that caches routing, neighbour, NAT, ACL, and L2 header information, allowing the fast path to bypass the full stack after the first packet.
6. User‑Space Stack Option
Provide a user‑space packet processing path (e.g., DPDK‑style) that directly maps NIC DMA buffers to user memory, eliminating kernel copy overhead. Use Intel I/OAT DCA to keep cache lines hot.
Implementation Sketch
Allocate a pool of skb containers at boot, bind them to NICs, and avoid per‑packet alloc_skb / kfree_skb calls. Extend skb with an owner field to track its current location (NIC or socket pool).
Integrate the VOQ design with the Linux CFS scheduler by adjusting each output task’s virtual runtime based on packet length:
outcard_tx_task_dec_vruntime(outcard, skb->len);Finally, replace the legacy route cache with a compact, RCU‑protected lookup structure that can fall back to DxR‑Pro++ when necessary.
These changes collectively aim to bring Linux packet forwarding performance close to the theoretical line‑speed of the underlying hardware.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
