Mastering Kubernetes Networking: From CNI Fundamentals to Advanced Troubleshooting
This comprehensive guide explains Kubernetes networking fundamentals, compares major CNI plugins such as Flannel, Calico, and Cilium, and provides detailed troubleshooting steps for pod communication, service routing, DNS issues, and eBPF‑based enhancements, helping operators build reliable, high‑performance clusters.
Kubernetes Network Model and CNI Role
Kubernetes assigns each pod a unique, globally routable IP address, lets containers in the same pod share a network namespace, and relies on the node’s kubelet (or other agents) to ensure pod‑to‑pod connectivity via CNI plugins.
Kubernetes Network Communication Matrix:
Pod‑to‑Pod (same namespace): localhost (no CNI needed)
Pod‑to‑Pod (same node): veth pair → bridge (cni0/br0)
Pod‑to‑Pod (cross node): veth pair → local cni0 → routing/encapsulation → remote cni0 → remote pod
Pod → external: NAT via Node IP + SNAT
External → Pod: Service → NodePort/Ingress → PodThe Container Network Interface (CNI) standard defines three core operations— ADD (create network), DEL (delete network), and CHECK (verify status)—which kubelet invokes on each plugin.
Popular CNI Plugins in 2026
Three plugins dominate production environments:
Calico : excels at NetworkPolicy enforcement and security‑focused deployments.
Flannel : prioritizes simplicity and quick start‑up.
Cilium : leverages eBPF for high performance, advanced security, and observability.
Flannel: Simple Overlay Network
Flannel allocates a /24 subnet per node and assigns pod IPs from that range. Cross‑node traffic is encapsulated in VXLAN or UDP overlay packets.
Flannel cross‑node flow:
Pod‑A (10.244.0.15) → eth0 → veth pair → cni0 (10.244.0.1)
→ kernel routing discovers target IP 10.244.2.30 not in local subnet
→ encapsulate as VXLAN (outer IP = Node‑2 IP 10.112.0.52)
→ forward over physical network to Node‑2
→ Node‑2 decapsulates and delivers to Pod‑B (10.244.2.30)Flannel creates a flannel.1 VXLAN device on each host. Advantages: zero‑configuration, works on any physical network. Limitations: no NetworkPolicy support and ~5‑10% performance overhead from encapsulation.
Calico: BGP Routing and NetworkPolicy
Calico uses BGP to advertise pod subnets, allowing direct L3 routing without encapsulation. Each node runs the Felix agent to program routes, iptables, and ACLs, while BIRD handles BGP sessions. Typha aggregates datastore traffic for large clusters.
# Calico installation (v3.28.x)
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
bgp: Enabled
ipPools:
- name: default-ipv4-pool
cidr: 10.244.0.0/16
natOutgoing: Enabled
blockSize: 26
encapsulation: VXLANCrossSubnet
nodeMetricsPort: 9091
typha:
enabled: true
count: 3Calico’s declarative NetworkPolicy (including GlobalNetworkPolicy) can span namespaces and support rich rule attributes such as ICMP types, DSCP marks, and directionality.
# Calico NetworkPolicy example – allow only API server to access MySQL
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: db-access-policy
namespace: production
spec:
selector: app == 'mysql'
types:
- Ingress
ingress:
- action: Allow
source:
selector: app == 'api-server' && role == 'backend'
protocol: TCP
destination:
ports: [3306]
action: Log
- action: Deny
source: {}Cilium: eBPF‑Based Networking
Cilium replaces the traditional kernel networking stack with eBPF programs that run in the kernel’s traffic‑control hook. It tracks three core maps: endpoint (pod IP), routing, and identity (security label).
Packet flow in Cilium:
Pod → veth pair → kernel stack → eBPF TC hook →
- endpoint map (local pod forwarding)
- routing map (cross‑node lookup)
- identity map (policy enforcement)
If policy matches, forward; otherwise drop.Key advantages (2026 production validation):
L7 policies for HTTP, gRPC, etc.
Pod‑level bandwidth shaping via eBPF.
Service‑topology awareness using egress-cached-svc (30%+ latency reduction).
Transparent WireGuard encryption for intra‑pod traffic.
# Cilium installation (v1.13)
apiVersion: cilium.io/v1alpha1
kind: CiliumConfig
metadata:
name: cilium-config
spec:
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4PodCIDRList: [10.244.0.0/16]
clusterPoolIPv4MaskSize: 26
eBPF:
enabled: true
lbMode: snat
hostRouting: true
bandwidthManager:
enabled: true
encryption:
enabled: true
type: WireGuard
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: truePod Network Communication Path
When a pod is scheduled, kubelet invokes the selected CNI plugin, which creates a vethXXXX pair. One end attaches to the host bridge (e.g., cni0), the other is renamed eth0 inside the pod’s network namespace.
Host view:
ens160 (physical NIC)
↑
cni0 (bridge 10.244.0.1/24)
├─ veth1a2b3c4d → eth0 @ pod-nginx-abc123 (10.244.0.15)
├─ veth5d6e7f8g → eth0 @ pod-api-def456 (10.244.0.16)
└─ veth9h0i1j2k → eth0 @ pod-db-ghi789 (10.244.0.17)
/proc/sys/net/ipv4/ip_forward = 1 (must be enabled)The Linux bridge forwards frames based on MAC address tables, while most CNI plugins still rely on iptables for NAT and packet filtering.
# View KUBE‑SERVICES chain (Service NAT rules)
iptables -t nat -L KUBE‑SERVICES -n --line | head -30
# View NodePort chain
iptables -t nat -L KUBE‑NODE‑PORT -n --line
# Find CNI‑added FORWARD rules
iptables -L FORWARD -n | grep -i calico/flannel/cniCross‑Node Pod Communication Modes
Two common approaches:
Overlay (VXLAN) : encapsulates packets; works on any physical network but adds ~18% bandwidth overhead.
Routing (Calico BGP) : routes pod IPs directly; minimal overhead (~4%) but requires BGP peering.
# iperf3 benchmark (10 Gbps NICs)
Scenario: cross‑node Pod‑to‑Pod TCP
Overlay (VXLAN): 8.2 Gbps (≈18% overhead)
Routing (BGP): 9.6 Gbps (≈4% overhead)
Physical baseline: 9.8 GbpsService and ClusterIP Mechanics
Kubernetes Service provides a virtual IP (ClusterIP) that load‑balances traffic to backend pods. kube‑proxy implements this via two modes:
iptables mode : creates multiple DNAT rules; linear lookup becomes a latency bottleneck when Services > 500.
IPVS mode : uses a hash table for O(1) lookup, offering stable performance at large scale.
# Switch kube‑proxy to IPVS mode
kubectl edit configmap -n kube-system kube-proxy
# change "mode: """ to "mode: ipvs"
# Verify IPVS rules
ipvsadm -L -nIPVS supports scheduling algorithms such as round‑robin, weighted round‑robin, least‑connection, and weighted‑least‑connection. For long‑lived connections (e.g., gRPC), least_conn is recommended.
ClusterIP DNS Resolution
Pods resolve Service names via CoreDNS. The typical resolution flow is:
Pod → glibc nsswitch → nscd (cache) → CoreDNS → Service ClusterIP → iptables/IPVS DNAT → Pod IPCoreDNS runs as a Deployment in kube-system and watches Service/Endpoint objects to keep DNS records up‑to‑date.
# CoreDNS ConfigMap (simplified)
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods verified
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 10.112.0.1 # upstream DNS
cache 30
loop
reload
loadbalance
}Ingress and NodePort
Ingress is a L7 entry point defined by an API object; the actual proxy is provided by an Ingress Controller (e.g., Nginx, Traefik, cloud ALB). Nginx Controller reloads configuration with nginx -s reload to avoid connection drops.
# Example NodePort Service
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
spec:
type: NodePort
selector:
app: api
ports:
- name: http
port: 80 # ClusterIP port
targetPort: 8080
nodePort: 30080
- name: grpc
port: 9090
targetPort: 9090
nodePort: 30090Setting hostNetwork: true lets a pod use the host’s network namespace directly, which eliminates CNI overhead but reduces isolation and risks port conflicts.
NetworkPolicy Practice
Basic namespace isolation can be achieved with a default‑deny policy, then selectively allow traffic based on pod labels.
# Default deny all ingress in namespace "production"
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow same‑namespace pods with label role=backend to talk to each other
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: production
spec:
podSelector:
matchLabels:
role: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: backendCalico’s GlobalNetworkPolicy can enforce cross‑namespace rules, such as allowing a monitoring namespace to scrape Prometheus endpoints.
# GlobalNetworkPolicy – allow monitoring namespace to access Prometheus ports
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-prometheus-scraping
spec:
namespaceSelector: has(projectcalico.org/name)
order: 50
ingress:
- action: Allow
protocol: TCP
destination:
ports: [9090, 9100]
source:
namespaceSelector: name == "monitoring"
selector: app == "prometheus"
egress:
- action: AllowDNS Troubleshooting
Common DNS problems include mis‑configured ndots, CoreDNS resource exhaustion, or unreachable upstream DNS. The ndots option controls how many label components trigger search‑path expansion; setting it too high forces many unnecessary lookups.
# Typical ndots troubleshooting steps
kubectl exec -it pod-test -- /bin/sh
# Test DNS resolution
nslookup kubernetes.default
# Verify /etc/resolv.conf
cat /etc/resolv.conf
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
# Adjust ndots via pod spec
spec:
dnsPolicy: ClusterFirstWithHostNet
dnsConfig:
nameservers: [10.96.0.10]
searches: [production.svc.cluster.local, svc.cluster.local, cluster.local]
options:
- name: ndots
value: "2"
- name: timeout
value: "2"
- name: attempts
value: "2"Cross‑Node Communication Failure Cases
Case 1 – Calico BGP Neighbor Failure
Symptoms: pods can communicate on the same node but not across nodes. Root cause often is a firewall blocking TCP 179 (BGP). Fix by opening the port and ensuring net.ipv4.ip_forward=1 on all nodes.
# Step 1: Verify Calico node status
calicoctl node status
# Expect BGP state "Established"
# Step 2: Check routing table for pod subnets
ip route | grep 10.244
# Step 3: Inspect BIRD logs for BGP errors
kubectl logs -n calico-system -l k8s-app=calico-node --tail=50 | grep -i bgp
# Step 4: Ping remote node’s pod subnet gateway
ping -I 10.244.0.1 10.244.2.1Case 2 – Flannel VXLAN Packet Loss
Symptoms: intermittent packet loss and high latency on cross‑node traffic. The typical cause is MTU mismatch when the physical network uses jumbo frames (e.g., 9000 bytes) but the VXLAN device keeps the default 1500 bytes. Adjust the VXLAN MTU to physical‑MTU ‑ 50 (e.g., 1450) to prevent fragmentation.
# Verify VXLAN device status
ip -d link show flannel.1
# Check FDB entries
bridge fdb show | grep flannel.1
# Adjust MTU if needed
ip link set flannel.1 mtu 1450
# Monitor error counters
cat /sys/class/net/flannel.1/statistics/rx_errors
cat /sys/class/net/flannel.1/statistics/tx_droppedCilium eBPF Enhancements
Kube‑proxy Replacement
Cilium can fully replace kube‑proxy; Service load‑balancing runs in eBPF programs, eliminating iptables/IPVS rules.
# Verify kube‑proxy replacement status
kubectl exec -it -n kube-system ds/cilium -- cilium-dbg status | grep KubeProxyReplacement
# Expected output includes "Kube‑proxy replacement: enabled"
# Confirm iptables no longer contain KUBE‑SERVICES chain
iptables -t nat -L | grep KUBEHubble Observability
Hubble provides built‑in L7 visibility, showing real‑time flow logs and service dependency graphs without external tracing systems.
# Enable Hubble UI
cilium hubble enable --ui
# Observe traffic from api‑server to order service
cilium hubble observe --from-label app=api-server
# Sample output:
# TIMESTAMP SOURCE DESTINATION TYPE VERDICT
# 10:23:45 api-server:8080 order-svc:80 HTTP/GET FORWARDED
# 10:23:46 order-svc:80 mysql:3306 L4/TCP FORWARDED
# 10:23:47 api-server:8080 redis:6379 HTTP/GET DENIEDConclusion
The article walks through Kubernetes networking fundamentals, evaluates Flannel, Calico, and Cilium for different operational goals, and presents concrete troubleshooting procedures for pod connectivity, Service routing, DNS resolution, and eBPF‑based enhancements. The evidence shows that Calico’s BGP routing delivers near‑bare‑metal performance, while Cilium’s eBPF implementation scales to tens of thousands of Services with microsecond‑level latency.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
