How eBPF Can Cut Container Network Latency by Up to 82%
Using eBPF to bypass traditional bridge, netfilter, and routing steps, this article demonstrates how container network packet forwarding can be accelerated, reducing receive and transmit latency by around 40‑82%, with detailed perf‑ftrace analysis, code examples, and performance comparisons of three forwarding modes.
Background
Linux provides a feature‑rich network protocol stack with excellent performance, but when multiple subsystems are combined to satisfy real‑world workloads, the balance between functionality and performance can tilt.
Container networking historically relied on bridge, netfilter + iptables (or LVS), and veth, which introduced significant forwarding overhead.
eBPF brings kernel programmability that can create shortcuts in the long forwarding path, allowing packets to reach their destination faster.
Network Topology
Two devices Node‑A and Node‑B are directly connected via eth1 (192.168.1.0/24).
Each node runs a container Pod‑A/B with a veth interface ve0 (172.17.0.0/16).
Each node creates a bridge br0 (172.17.0.0/16) linked to the veth devices.
Static routes are set in both node and pod network namespaces; pods use br0 as gateway, nodes use the peer node’s IP.
For testing, eth1’s queue is limited to 1 and its interrupt is bound to CPU 0.
<code># ethtool -L eth1 combined 1</code><code># echo 0 > /proc/irq/$(cat /proc/interrupts | awk -F ':' '/eth1/ {gsub(/ /,""); print $1}')/smp_affinity_list</code>Bridge
Bridge + veth is the earliest container‑network forwarding mode. The packet flow is:
Incoming packet on eth1 destined for 172.17.0.0/16 is routed to br0.
br0 looks up the FDB; if no entry, it floods, otherwise forwards to the appropriate veth (lxc0).
lxc0 (veth) sends the packet to its peer, where ve0 on the pod receives it.
For outbound traffic, the pod’s static route sends the packet to ve0, which reaches lxc0, then br0, then the L3 stack, and finally out through eth1 to the remote node.
Perf ftrace shows that this path traverses multiple subsystems (netfilter, routing, bridge), causing noticeable latency.
Receive Path
<code># perf ftrace -C0 -G '__netif_receive_skb_list_core' -g 'smp_*'</code>The packet passes routing lookup, bridge forwarding, veth forwarding, and multiple netfilter hook points.
It finally reaches enqueue_to_backlog , where it is queued in a per‑CPU input packet queue; the soft‑interrupt finishes in ~79 µs.
A second soft‑interrupt later processes the queue, delivering the packet to the pod’s protocol stack and ultimately to the socket.
The receive path therefore consumes two soft‑interrupts.
Transmit Path
<code># perf ftrace -C0 -G '__netif_receive_skb_core' -g 'smp_*'</code>The packet leaves the pod via ve0, traverses veth, bridge, routing, and the physical NIC.
It also passes through netfilter hook points.
The NIC driver’s transmit function completes the soft‑interrupt in ~62 µs.
Analysis
The bridge + veth mode incurs many kernel subsystem traversals, leading to higher latency.
TC Redirect
eBPF can be attached to several kernel hook points relevant to networking: XDP (eXpress Data Path), TC (Traffic Control), and LWT (Light‑Weight Tunnel). For container networking, TC is preferred because it sits at the stack’s ingress and egress, providing full context (socket, cgroup, etc.) that XDP lacks.
Accelerate Receive Path
<code># tc qdisc add dev eth1 clsact</code><code># tc filter add dev eth1 ingress bpf da obj ingress_redirect.o sec classifier-redirect</code>eBPF program (attached at eth1’s TC ingress hook) redirects packets directly to the lxc0 interface:
<code>SEC("classifier-redirect")
int cls_redirect(struct __sk_buff *skb) {
/* The ifindex of lxc0 is 2 */
return bpf_redirect(2, 0);
}</code>Accelerate Transmit Path
<code># tc qdisc add dev lxc0 clsact</code><code># tc filter add dev lxc0 ingress bpf da obj egress_redirect.o sec classifier-redirect</code>The eBPF program redirects packets from lxc0 back to eth1:
<code>SEC("classifier-redirect")
int cls_redirect(struct __sk_buff *skb) {
/* The ifindex of eth1 is 1 */
return bpf_redirect(1, 0);
}</code>Analysis
By skipping bridge, routing, and netfilter, the TC‑based redirect reduces receive latency to ~43 µs (‑45%) and transmit latency to ~36 µs (‑42%) compared with the original bridge mode.
TC Redirect Peer
<code># tc qdisc add dev eth1 clsact</code><code># tc filter add dev eth1 ingress bpf da obj ingress_redirect_peer.o sec classifier-redirect</code>The eBPF program uses bpf_redirect_peer to forward the packet directly to the peer interface (ve0), eliminating the enqueue_to_backlog step and saving one soft‑interrupt.
<code>SEC("classifier-redirect")
int cls_redirect(struct __sk_buff *skb) {
/* The ifindex of lxc0 is 2 */
return bpf_redirect_peer(2, 0);
}</code>Perf results show a total forwarding time of ~14 µs after removing the pod‑namespace processing overhead, which is a 67%‑82% reduction compared with bridge and TC‑redirect modes.
Summary
The article compares three container‑network forwarding methods, analyzes their performance with perf‑ftrace, and demonstrates that a few lines of eBPF code can dramatically shorten the packet path, cutting network forwarding latency by up to 82%.
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.