How Kubernetes Routes Packets: Inside Pods, CNI, and Services
This article explains how Kubernetes forwards packets from the initial web request through pod network namespaces, the role of the pause container, veth pairs, bridges, CNI plugins, and iptables‑based service NAT, providing a step‑by‑step view of intra‑cluster and external traffic handling.
Kubernetes Network Requirements
Kubernetes defines a set of basic networking rules:
Pods must be able to communicate with any other Pod in the cluster without using Network Address Translation (NAT).
Processes running on a node must be able to communicate with any Pod on that node without NAT.
Each Pod receives its own IP address (IP‑per‑Pod) and can be reached directly via that address.
These requirements are abstract and do not dictate a specific implementation.
To satisfy them, the following challenges must be addressed:
Ensuring containers in the same Pod behave as if they share a single host.
Enabling Pod‑to‑Pod communication.
Providing load‑balanced access to Pods via Services.
Allowing Pods to receive traffic from outside the cluster.
This article focuses on the first three points, starting with networking inside a Pod.
Linux Network Namespaces in a Pod
Consider a Pod that runs two containers – a busybox container and an nginx container:
apiVersion: v1
kind: Pod
metadata:
name: multi-container-Pod
spec:
containers:
- name: container-1
image: busybox
command: ['/bin/sh', '-c', 'sleep 1d']
- name: container-2
image: nginxWhen the Pod is created:
The Pod receives an independent network namespace on the node.
An IP address is allocated to the Pod, and the two containers share that address.
Both containers share the same network namespace and can see each other locally.
Linux network namespaces are isolated logical spaces that can contain their own interfaces, routing tables, firewall rules, and other network resources.
The root network namespace holds the physical network interfaces; each additional namespace is created as an isolated view of the network.
You can list the namespaces on a host with: ip netns list On a Kubernetes node you will see namespaces created by the CNI plugin, e.g.:
cni-0f226515-e28b-df13-9f16-dd79456825ac
cni-4e4dfaac-89a6-2034-6098-dd8b2ee51dcd
cni-7e94f0cc-9ee8-6a46-178a-55c73ce58f2e
cni-7619c818-5b66-5d45-91c1-1c516f559291
cni-3004ec2c-9ac2-2928-b556-82c7fb37a4d8When a Pod is scheduled, the CNI plugin:
Assigns an IP address to the Pod.
Connects the containers to the network.
If a Pod contains multiple containers, they all share the same network namespace created by the runtime (containerd or CRI‑O).
The Pause Container
Every Pod has an additional hidden container called the pause container. It creates and holds the Pod’s network namespace.
Listing containers on a node shows a pause container for each Pod:
docker ps | grep pause
fa9666c1d9c6 k8s.gcr.io/pause:3.4.1 "/pause" k8s_POD_kube-dns-599484b884-sv2js…
44218e010aeb k8s.gcr.io/pause:3.4.1 "/pause" k8s_POD_blackbox-exporter-55c457d…The pause container runs a minimal sleep process, creates the network namespace, and remains idle, providing a stable namespace for the other containers.
Assigning an IP Address to a Pod
Retrieve the Pod’s IP address:
kubectl get pod multi-container-Pod -o jsonpath={.status.podIP}
10.244.4.40Inspect the network namespace to see the interface and IP:
ip netns exec cni-0f226515-e28b-df13-9f16-dd79456825ac ip aThe output shows eth0 with the assigned IP 10.244.4.40/32. You can also verify that the nginx container is listening on port 80 inside the namespace:
ip netns exec cni-0f226515-e28b-df13-9f16-dd79456825ac netstat -lnp
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 692698/nginx: masterPod‑to‑Pod Traffic
Pod‑to‑Pod communication can occur either on the same node or across different nodes.
Each Pod is connected to the node’s root namespace via a veth pair – one end lives in the Pod’s namespace, the other in the root namespace. The root ends of all veth devices are attached to a Linux bridge, which acts as a virtual switch.
When Pod‑A on the same node sends traffic to Pod‑B:
The packet leaves Pod‑A’s eth0, enters the veth pair, and reaches the bridge in the root namespace.
The bridge uses ARP to resolve Pod‑B’s MAC address and forwards the frame to the veth pair belonging to Pod‑B.
Pod‑B receives the packet on its eth0.
For cross‑node traffic, after reaching the bridge the packet is routed to the node’s default gateway (typically the physical eth0) because the destination IP is not in the local subnet. The packet then traverses the underlying network to the destination node, where the same bridge and veth mechanism deliver it to the target Pod.
Bitwise Operations for Routing Decisions
Kubernetes determines whether a destination IP is in the same subnet by performing a bitwise AND between the IP address and the subnet mask. If the result differs from the node’s network, the packet is sent to the default gateway.
Container Network Interface (CNI)
CNI is responsible for configuring networking for Pods on a node. It implements four operations:
ADD – attach a container to the network.
DEL – detach a container.
CHECK – verify network health.
VERSION – display plugin version.
Common CNI plugins include Calico, Cilium, Flannel, and Weave Net. A typical Calico configuration looks like:
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"datastore_type": "kubernetes",
"ipam": {"type": "calico-ipam", "assign_ipv4": "true"},
"policy": {"type": "k8s"},
"kubernetes": {"k8s_api_root": "https://10.96.0.1:443", "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"}
},
{"type": "bandwidth", "capabilities": {"bandwidth": true}},
{"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
]
}Calico provides a layer‑3 BGP network, while Cilium uses eBPF for a layer‑3‑to‑layer‑7 overlay. The choice depends on cluster size and operational preferences.
Services and iptables NAT
A Service gets a stable virtual IP (ClusterIP). When a Pod sends traffic to a Service IP, iptables rules in the PRE_ROUTING chain perform Destination NAT (DNAT) to rewrite the packet’s destination to the IP of one of the backing Pods.
After the packet reaches the target Pod, conntrack records the flow so that the response can be Source NAT (SNAT) back to the Service IP, making the client see the Service as the source.
These iptables rules can be inspected on a node with iptables-save. The rules are automatically generated from the Service definition.
Summary
How containers communicate within a Pod and across Pods.
Pod‑to‑Pod traffic on the same node versus different nodes.
Pod‑to‑Service traffic and how iptables performs DNAT/SNAT.
Key concepts: network namespaces, veth pairs, bridges, CNI plugins, overlay networks, Netfilter, conntrack.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
