Kubernetes Service Load Balancing at Scale with BPF and XDP
This article explains Kubernetes' core networking model, details the various Service types (PodIP, HostPort, NodePort, ExternalIP, LoadBalancer, ClusterIP), describes Cilium's eBPF/XDP implementation for high‑performance load balancing, and presents performance benchmarks and recent BPF kernel extensions.
The article introduces the Kubernetes networking fundamentals, emphasizing that each Pod runs in its own network namespace and that the actual network implementation is delegated to CNI plugins, using Cilium as a concrete example.
It then enumerates the five primary ways to access a Service: direct PodIP access, HostPort mapping, NodePort Service, ExternalIP Service, and LoadBalancer Service, discussing the advantages and drawbacks of each method, such as IP stability, built‑in load balancing, and external accessibility.
Next, the Cilium implementation is described: a cilium‑agent runs on every node, watches the Kubernetes API server, and updates BPF maps that store Service‑to‑backend mappings. Load balancing is performed at two layers: a socket‑level BPF program that rewrites connections for east‑west traffic, and a TC/XDP‑level BPF program that handles north‑south traffic with DNAT/SNAT, supporting features like session affinity and wildcard matching.
The socket‑level BPF attaches to system‑call hooks (e.g., connect , sendmsg , recvmsg ) to translate Service IPs to Pod IPs without generating packets, reducing latency and preserving TCP back‑pressure. Helper functions such as bpf_get_socket_cookie() and bpf_get_netns_cookie() are used for flow identification and affinity.
For the TC/XDP layer, Cilium employs generic BPF helpers, inline assembly to avoid verifier issues, and custom memory operations to improve performance. New kernel extensions like bpf_redirect_neigh() and bpf_redirect_peer() enable direct redirection between host and pod network namespaces, eliminating the need to traverse the kernel networking stack.
Performance measurements show that Cilium's XDP mode can sustain the full 10 Mpps test traffic, outperforming TC mode, kube‑proxy iptables, and IPVS implementations, while also reducing soft‑IRQ overhead. Additional benchmarks demonstrate significant gains in TCP throughput and transaction rates when using the new BPF redirection helpers.
Finally, the article mentions upcoming BPF kernel extensions, such as improved context handling, per‑CPU scratch maps, and optimized map updates, which further enhance scalability and efficiency of Kubernetes Service load balancing.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.