Cloud Native 15 min read

Designing Large‑Scale Hybrid Cloud Container Networks with Kubernetes

This article explains how to build a high‑performance, 5K‑node hybrid‑cloud container network by reviewing Kubernetes pod and service networking, comparing iptables, IPVS and eBPF implementations, and describing large‑scale overlay, BGP, IPVlan, and NodeLocal DNS solutions.

TAL Education Technology
TAL Education Technology
TAL Education Technology
Designing Large‑Scale Hybrid Cloud Container Networks with Kubernetes

Background: Building a large hybrid‑cloud container network requires proven performance, validated architecture, and cloud‑provider support. Cilium is one of the few projects that has demonstrated support for single‑cluster deployments of up to 5 000 nodes using eBPF.

Kubernetes Pod Network Overview: In Kubernetes the networking model focuses on Pods, Services and external clients rather than host‑or‑VM interfaces. Pods, Services, and Nodes communicate via IP addresses; key terms include ClusterIP, Pod IP, and Node IP.

Classic Container Network Modes: The article lists tunnel‑based overlay (VxLAN/IPIP/GRE) and routing‑based approaches, illustrating each with diagrams.

Kubernetes Service Overview: Services solve the short‑lived Pod IP problem by providing stable ClusterIP load‑balancing. Design considerations include load‑balancing across multiple Pods, session persistence, and handling IP churn.

iptables vs. IPVS: iptables was introduced in Kubernetes 1.2 for Service load‑balancing but suffers from full‑table updates, scalability limits, and service disruption. Example metrics: adding a rule to a 5 k Service set (40 k rules) takes ~11 minutes, while 20 k Services (160 k rules) takes ~5 hours. IPVS, adopted in Kubernetes 1.11, stores rules in hash tables, offering constant‑time updates and supporting large‑scale clusters.

Kernel conntrack race issue: In IPVS mode, missing SNAT modules lead to reliance on nf_conntrack, which can drop SYN packets on low‑version kernels. Mitigations include enabling random‑fully for iptables, disabling parallel DNS lookups, and avoiding Alpine‑based containers.

eBPF Implementation: BPF provides programmable, high‑performance packet processing. Projects such as Cilium use eBPF for pod networking and Service load‑balancing, replacing iptables and conntrack. Facebook, Brendan Gregg, and the Cilium team are major contributors.

Replacing kube‑proxy with Cilium/BPF: Cilium implements its own connection tracker and SNAT using LRU BPF maps, handling source IP/port translation and fast NAT entry recycling. It also supports Direct‑Server‑Return (DSR) mode, allowing backend Pods to reply directly to clients.

Building Large‑Scale Container Networks:

VxLAN Overlay: Widely used for massive data‑center networks; Cilium can create multi‑cloud VxLAN meshes (clustermesh) for inter‑cluster connectivity.

BGP Router Mode: Full‑mesh (e.g., Calico, Kube‑router) works for < 100 nodes; larger deployments benefit from Route Reflectors to reduce BGP peer count. Top‑of‑Rack (ToR) BGP designs can support ~2 k nodes per cluster.

IPVlan L2 Mode: Similar to macvlan but shares a MAC address; offers a simple data path that bypasses host network namespace and avoids conntrack contention.

NodeLocal DNS: Deploying a DNS cache DaemonSet on each node reduces latency, lowers CoreDNS query volume, and eliminates iptables DNAT and conntrack overhead for DNS traffic.

TC Redirection Example (IPVlan integration):

tc filter add dev $ENIA egress proto ip u32 \
    match ip dst $SERVICE_CIDR \
    action tunnel_key unset pipe \
    action tc_mirred ingress redirect dev ipvl_a0

References: The article lists several Chinese and English sources covering BGP in data centers, large‑scale K8s Service performance, eBPF adoption, IPVlan implementations, and cloud‑provider specific networking solutions.

kuberneteseBPFBGPContainer NetworkingVXLANCiliumipvs
TAL Education Technology
Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.