Cloud Native 12 min read

Boosting Kubernetes Service Performance: Inside Tencent’s IPVS‑BPF Optimization

This article examines the limitations of traditional Kubernetes Service implementations, introduces Tencent TKE’s IPVS‑BPF mode that bypasses nf_conntrack using eBPF for SNAT, details its design and implementation steps, and presents extensive performance measurements showing significant latency and throughput improvements.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
Boosting Kubernetes Service Performance: Inside Tencent’s IPVS‑BPF Optimization

1. Current Service Modes and Their Limitations

Kubernetes Service provides intra‑cluster communication and load balancing via three main implementations: userspace, iptables, and IPVS. IPVS offers the best performance but still relies on nf_conntrack for SNAT, introducing considerable CPU and latency overhead due to its complex state machine.

In iptables mode, scalability suffers because each new rule requires traversing and modifying the entire rule set, resulting in O(n²) control‑plane complexity and O(n) data‑plane processing.

IPVS uses hash tables for O(1) service lookup, yet its dependence on nf_conntrack for SNAT creates a performance bottleneck.

2. IPVS‑BPF Design Overview

The Tencent TKE team created an IPVS‑BPF mode that completely bypasses nf_conntrack and implements SNAT with eBPF. The core ideas are:

Introduce a switch in the IPVS kernel module to toggle between native IPVS logic and the new IPVS‑BPF logic.

Move the IPVS hook from LOCAL_IN to PREROUTING so that Service requests skip nf_conntrack.

Update connection‑creation and deletion code to add or remove session entries in an eBPF map.

Attach eBPF SNAT code to the qdisc layer, using the session map to perform address translation.

Additional handling for ICMP and packet fragmentation is also provided. The solution keeps the code size manageable: roughly 500 lines of BPF code and 1,000 lines of IPVS modifications.

3. Performance Evaluation

Measurements were performed with perf for CPU counters and with wrk (short‑connection) and iperf (long‑connection) workloads. The test environment used a single‑core LB node and an eight‑core backend node to avoid client‑side bottlenecks.

Results:

NodePort short‑connection throughput increased by 64% and p99 latency dropped by 47%.

ClusterIP short‑connection throughput increased by 40% and p99 latency dropped by 31%.

Long‑connection bandwidth (iperf) improved by 22%.

Average CPU instructions per request decreased by 38%, the primary cause of the speedup.

CPI rose modestly (~16%), indicating higher per‑cycle work that warrants further study.

4. Additional Optimizations, Limitations, and Future Work

During development, the team also fixed issues such as low performance when conn_reuse_mode=1, occasional DNS delays, and external IP health‑check failures. Limitations include inability for a pod to reach its own Service (requests are redirected to other pods) and the need for a whitelist to enable the feature.

Future directions involve leveraging Cilium‑style eBPF techniques to further accelerate ClusterIP traffic and investigating the root cause of the CPI increase.

5. Enabling IPVS‑BPF in Tencent Cloud TKE

To activate the mode, create a cluster in the TKE console, go to Advanced Settings → Kube‑proxy Mode, and select ipvs-bpf. The feature currently requires a whitelist request via the official application page.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceCloud NativeKuberneteseBPFService MeshIPVS
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.