Unraveling Kubernetes Networking: From Pod IPs to Service Load Balancing
This article explains the evolution of Kubernetes networking, detailing how container networks originated with Docker, how Kubernetes assigns each Pod a unique IP, the Flannel host‑gw routing model, and the inner workings of Services, including LVS‑based implementations and the four Service types.
Evolution of Container Networking
Docker originally used a simple bridge network with a reserved IP range. Containers were isolated from the host network; outbound traffic was NAT‑ed (SNAT) to the node’s IP, while inbound traffic required a DNAT rule on the node to forward a host port to the container. This model makes it difficult to distinguish container traffic from host traffic, which complicates high‑availability scenarios where multiple containers provide the same service.
Pod‑IP Identity
Kubernetes assigns each Pod a unique IP address that becomes the pod’s identity in the TCP/IP stack. Accessing a Pod via its IP reaches the Pod directly without any NAT, and the same IP is used for all containers inside the Pod, enabling a group of containers to be treated as a single deployment unit.
Implementation Choices
Kubernetes does not mandate a specific networking implementation. An underlay network can provide external routing, while an overlay network can be added on top of the existing infrastructure. The only requirement is that the chosen solution satisfies the per‑Pod IP model.
Flannel Host‑gw Routing
Flannel’s host‑gw mode gives each node a dedicated subnet. The node’s cni0 bridge acts as the gateway for that subnet. This design is simple to operate but prevents Pods from migrating across nodes because the subnet is bound to a single node.
Default route : defines the node’s default gateway (e.g., default via 10.244.0.1 dev cni0).
Local subnet rule : for the node’s own subnet, e.g.,
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1.
Remote subnet rule : routes traffic destined for another node’s subnet to that node’s host IP, e.g., 10.244.1.0/24 via 10.168.0.3 dev eth0.
Packet flow example :
# Container 10.244.0.2 sends to 10.244.1.3
# 1. Container builds TCP/UDP packet, sets src MAC (its veth) and dst MAC (gateway MAC).
# 2. Packet follows the default route to cni0, which forwards it to the host stack.
# 3. Host ARP resolves the gateway MAC (10.244.0.1) and forwards the packet out via eth0.
# 4. Remote node receives the packet, the bridge resolves the destination MAC for 10.244.1.3, and the packet reaches the target Pod.Limitation: this works only when the two nodes share a Layer‑2 link; otherwise the destination MAC cannot be reached directly.
Protocol‑Stack Perspective
Transmission follows the TCP/IP model: Application → Transport (TCP/UDP) → IP → MAC → Physical wire. Reception reverses the order, stripping MAC then IP headers before delivering the payload to the appropriate process.
Network‑Topology View
Pod traffic must cross two boundaries:
From the container network namespace to the host namespace (typically via a veth pair and a bridge such as cni0).
From the host to the remote destination (via routing, BGP, or tunnels).
Key considerations are:
Access : mechanism that connects container to host (veth + bridge, macvlan, IPVLAN, etc.).
Flow control : whether NetworkPolicy hooks are needed and where they are placed in the data path.
Channel : the method used to transport packets between nodes (direct routing, BGP, VXLAN, IPIP, etc.).
Service Mechanism
Kubernetes Service provides client‑side load balancing using a virtual IP (VIP) that maps to a set of backend Pod IPs (real IPs, RIP). kube-proxy watches the API server for Service and Pod changes and programs either iptables or IPVS rules accordingly.
LVS‑based Service Implementation
Bind the VIP to the node (add a local IP address, e.g., ip address add 10.96.0.1/32 dev lo).
Create an IPVS virtual server for the VIP (e.g., ipvsadm -A -t 10.96.0.1:80 -s rr).
Add each backend Pod IP as a real server (e.g., ipvsadm -a -t 10.96.0.1:80 -r 10.244.0.2:80 -w 1). kube-proxy performs the same steps automatically and updates the rules when Pods are added, removed, or when the Service is deleted.
Service Types
ClusterIP : internal virtual IP, reachable only inside the cluster.
NodePort : exposes the Service on a static port on every node; external clients reach it via NodeIP:NodePort.
LoadBalancer : integrates with cloud‑provider load balancers; automatically creates a NodePort and a ClusterIP.
ExternalName : maps the Service to an external DNS name, delegating load balancing outside Kubernetes.
Ingress Integration
An Ingress controller runs as a set of Pods that expose a NodePort. A cloud provider’s external load balancer (ELB) forwards traffic to that NodePort, which then routes to the appropriate ClusterIP Service and finally to the backend Pods. This layered path removes single points of failure and provides robust traffic management.
Key Takeaways
Understanding the evolution from Docker bridge networking to the per‑Pod IP model clarifies why Kubernetes assigns a unique IP to each Pod.
Packet flow follows the layered TCP/IP model: application → transport → IP → MAC → wire, and the reverse for reception.
Flannel host‑gw demonstrates a simple routing‑based implementation; it works when nodes share a Layer‑2 network.
Services (ClusterIP, NodePort, LoadBalancer, ExternalName) and Ingress together provide flexible, production‑grade networking and load balancing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
