Cloud Native 13 min read

How eBPF Can Replace iptables to Boost Service Mesh Performance

This article explains how replacing the traditional iptables‑based sidecar traffic‑hijacking in Istio with eBPF programs and sockmap can cut CPU usage, increase QPS and lower latency, while detailing the implementation, architecture, and performance results on Kubernetes.

Tencent Cloud Middleware

Mar 10, 2022

How eBPF Can Replace iptables to Boost Service Mesh Performance

Background

Istio and similar service meshes typically use a sidecar architecture where iptables redirects all inbound and outbound traffic to the Envoy proxy. While non‑intrusive, the sidecar adds latency and consumes extra resources, which is a major concern for users evaluating service‑mesh solutions.

iptables Traffic Hijacking

The current community approach injects two containers into each pod: istio-init (an init container that creates iptables rules and then exits) and istio-proxy (running Envoy). iptables redirects inbound traffic to port 15006 (Envoy’s VirtualInboundListener) and outbound traffic to port 15001 (Envoy’s VirtualOutboundListener).

eBPF‑Based Inbound Hijacking

eBPF programs hook the bind system call. When an application binds to 0.0.0.0:80, the eBPF program rewrites the address to 127.0.0.1:80. Two separate eBPF programs handle IPv4 and IPv6 binds.

Because iptables rules are per‑netns, eBPF programs attach globally; therefore we need a way to identify which pods should be hijacked. Each pod’s network namespace (netns) has a unique cookie, which is stored in a map called cookie_map. The eBPF program checks whether the current socket’s netns cookie exists in cookie_map before rewriting the address.

After rewriting the bind address, a listener configuration pod_ip:80 is pushed to Envoy so that traffic is forwarded to the application’s actual listening address. To allow the non‑root istio‑proxy user to bind privileged ports, the kernel parameter is adjusted with: sysctl net.ipv4.ip_unprivileged_port_start=0 This reduces the per‑packet conntrack overhead of iptables because the eBPF logic runs only once per socket creation.

eBPF‑Based Outbound Hijacking

TCP Traffic

_connect4

hijacks the connect system call, rewrites the destination to 127.0.0.1:15001 (Envoy’s VirtualOutboundListener), and stores the original destination in sk_storage_map.

After the TCP connection is established, sockops reads sk_storage_map and saves the original destination (source‑IP, dest‑IP, source‑port, dest‑port) as a key in origin_dst_map. _getsockopt later reads origin_dst_map to return the original destination address to Envoy.

UDP Traffic

_connect4

and _sendmsg4 modify the UDP destination to 127.0.0.1:15053 and store the original destination in sk_storage_map. Two cases are handled:

connect → send (handled by _connect4)

direct sendto (handled by _sendmsg4) recvmsg4 reads sk_storage_map and restores the source address of the reply packet to the original destination, which is required by applications such as nslookup.

Both TCP and UDP eBPF paths execute only once per socket, avoiding the per‑packet conntrack overhead of iptables.

Sockmap Acceleration

The solution also adopts the sockmap technique pioneered by Cilium. Key components: sock_hash: an eBPF map that stores socket information keyed by the four‑tuple (src‑IP, dst‑IP, src‑port, dst‑port). _sockops: listens for socket events and populates sock_hash. _sk_msg: intercepts sendmsg, looks up the peer socket in sock_hash, and uses bpf_msg_redirect_hash to send data directly to the peer, bypassing the TCP/IP stack.

A potential conflict arises when two pods on the same node use the same source port (e.g., Envoy uses port 50000 to contact both pods). To avoid key collisions, the netns cookie is added to the key, and for non‑localhost requests the cookie is set to 0, ensuring uniqueness while still enabling intra‑node acceleration.

Kernel patches were submitted to expose netns cookie information to eBPF programs ( sockops and sk_msg) and were merged in Linux 5.15.

Architecture

The overall architecture runs an istio‑ebpf DaemonSet on each node, which loads and attaches the eBPF programs and creates the required eBPF maps. The existing istio‑init container remains but no longer creates iptables rules; instead it writes the pod’s netns cookie into cookie_map. istiod is extended to emit different xDS configurations depending on whether a pod uses iptables or eBPF for traffic hijacking.

Performance Comparison

Test environment: Ubuntu 21.04, kernel 5.15.7.

eBPF reduces System CPU usage by ~20% under identical conditions.

eBPF increases QPS by ~20%.

eBPF lowers request latency.

Conclusion

The sidecar architecture of service meshes inevitably adds latency and resource consumption. By replacing iptables‑based traffic hijacking with eBPF and accelerating sidecar‑application communication via sockmap, the solution achieves noticeable reductions in latency and CPU overhead. The approach depends on kernel 5.15 features and is expected to be rolled out in early next year, with the TCM team continuing to explore further performance optimizations.

References

https://istio.io

https://jimmysong.io/blog/sidecar-injection-iptables-and-traffic-routing

https://ebpf.io

https://cilium.io

https://istio.io/latest/blog/2020/dns-proxy

https://arthurchiao.art/blog/socket-acceleration-with-ebpf-zh

https://github.com/cilium/cilium/tree/v1.11.0/bpf/sockops

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6cf1770d

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fab60e29f

cloud-native Kubernetes eBPF iptables Replacement

Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.