Cloud Native 29 min read

Mastering Kubernetes Networking: From CNI Fundamentals to Advanced Troubleshooting

This comprehensive guide explains Kubernetes networking fundamentals, compares major CNI plugins such as Flannel, Calico, and Cilium, and provides detailed troubleshooting steps for pod communication, service routing, DNS issues, and eBPF‑based enhancements, helping operators build reliable, high‑performance clusters.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Kubernetes Networking: From CNI Fundamentals to Advanced Troubleshooting

Kubernetes Network Model and CNI Role

Kubernetes assigns each pod a unique, globally routable IP address, lets containers in the same pod share a network namespace, and relies on the node’s kubelet (or other agents) to ensure pod‑to‑pod connectivity via CNI plugins.

Kubernetes Network Communication Matrix:

Pod‑to‑Pod (same namespace): localhost (no CNI needed)
Pod‑to‑Pod (same node): veth pair → bridge (cni0/br0)
Pod‑to‑Pod (cross node): veth pair → local cni0 → routing/encapsulation → remote cni0 → remote pod
Pod → external: NAT via Node IP + SNAT
External → Pod: Service → NodePort/Ingress → Pod

The Container Network Interface (CNI) standard defines three core operations— ADD (create network), DEL (delete network), and CHECK (verify status)—which kubelet invokes on each plugin.

Popular CNI Plugins in 2026

Three plugins dominate production environments:

Calico : excels at NetworkPolicy enforcement and security‑focused deployments.

Flannel : prioritizes simplicity and quick start‑up.

Cilium : leverages eBPF for high performance, advanced security, and observability.

Flannel: Simple Overlay Network

Flannel allocates a /24 subnet per node and assigns pod IPs from that range. Cross‑node traffic is encapsulated in VXLAN or UDP overlay packets.

Flannel cross‑node flow:
Pod‑A (10.244.0.15) → eth0 → veth pair → cni0 (10.244.0.1)
→ kernel routing discovers target IP 10.244.2.30 not in local subnet
→ encapsulate as VXLAN (outer IP = Node‑2 IP 10.112.0.52)
→ forward over physical network to Node‑2
→ Node‑2 decapsulates and delivers to Pod‑B (10.244.2.30)

Flannel creates a flannel.1 VXLAN device on each host. Advantages: zero‑configuration, works on any physical network. Limitations: no NetworkPolicy support and ~5‑10% performance overhead from encapsulation.

Calico: BGP Routing and NetworkPolicy

Calico uses BGP to advertise pod subnets, allowing direct L3 routing without encapsulation. Each node runs the Felix agent to program routes, iptables, and ACLs, while BIRD handles BGP sessions. Typha aggregates datastore traffic for large clusters.

# Calico installation (v3.28.x)
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    bgp: Enabled
    ipPools:
    - name: default-ipv4-pool
      cidr: 10.244.0.0/16
      natOutgoing: Enabled
      blockSize: 26
      encapsulation: VXLANCrossSubnet
    nodeMetricsPort: 9091
    typha:
      enabled: true
      count: 3

Calico’s declarative NetworkPolicy (including GlobalNetworkPolicy) can span namespaces and support rich rule attributes such as ICMP types, DSCP marks, and directionality.

# Calico NetworkPolicy example – allow only API server to access MySQL
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: db-access-policy
  namespace: production
spec:
  selector: app == 'mysql'
  types:
  - Ingress
  ingress:
  - action: Allow
    source:
      selector: app == 'api-server' && role == 'backend'
    protocol: TCP
    destination:
      ports: [3306]
    action: Log
  - action: Deny
    source: {}

Cilium: eBPF‑Based Networking

Cilium replaces the traditional kernel networking stack with eBPF programs that run in the kernel’s traffic‑control hook. It tracks three core maps: endpoint (pod IP), routing, and identity (security label).

Packet flow in Cilium:
Pod → veth pair → kernel stack → eBPF TC hook →
  - endpoint map (local pod forwarding)
  - routing map (cross‑node lookup)
  - identity map (policy enforcement)
If policy matches, forward; otherwise drop.

Key advantages (2026 production validation):

L7 policies for HTTP, gRPC, etc.

Pod‑level bandwidth shaping via eBPF.

Service‑topology awareness using egress-cached-svc (30%+ latency reduction).

Transparent WireGuard encryption for intra‑pod traffic.

# Cilium installation (v1.13)
apiVersion: cilium.io/v1alpha1
kind: CiliumConfig
metadata:
  name: cilium-config
spec:
  ipam:
    mode: cluster-pool
  operator:
    clusterPoolIPv4PodCIDRList: [10.244.0.0/16]
    clusterPoolIPv4MaskSize: 26
  eBPF:
    enabled: true
  lbMode: snat
  hostRouting: true
  bandwidthManager:
    enabled: true
  encryption:
    enabled: true
    type: WireGuard
  hubble:
    enabled: true
  relay:
    enabled: true
  ui:
    enabled: true

Pod Network Communication Path

When a pod is scheduled, kubelet invokes the selected CNI plugin, which creates a vethXXXX pair. One end attaches to the host bridge (e.g., cni0), the other is renamed eth0 inside the pod’s network namespace.

Host view:
ens160 (physical NIC)
  ↑
  cni0 (bridge 10.244.0.1/24)
    ├─ veth1a2b3c4d → eth0 @ pod-nginx-abc123 (10.244.0.15)
    ├─ veth5d6e7f8g → eth0 @ pod-api-def456 (10.244.0.16)
    └─ veth9h0i1j2k → eth0 @ pod-db-ghi789 (10.244.0.17)
/proc/sys/net/ipv4/ip_forward = 1 (must be enabled)

The Linux bridge forwards frames based on MAC address tables, while most CNI plugins still rely on iptables for NAT and packet filtering.

# View KUBE‑SERVICES chain (Service NAT rules)
iptables -t nat -L KUBE‑SERVICES -n --line | head -30
# View NodePort chain
iptables -t nat -L KUBE‑NODE‑PORT -n --line
# Find CNI‑added FORWARD rules
iptables -L FORWARD -n | grep -i calico/flannel/cni

Cross‑Node Pod Communication Modes

Two common approaches:

Overlay (VXLAN) : encapsulates packets; works on any physical network but adds ~18% bandwidth overhead.

Routing (Calico BGP) : routes pod IPs directly; minimal overhead (~4%) but requires BGP peering.

# iperf3 benchmark (10 Gbps NICs)
Scenario: cross‑node Pod‑to‑Pod TCP
Overlay (VXLAN): 8.2 Gbps (≈18% overhead)
Routing (BGP): 9.6 Gbps (≈4% overhead)
Physical baseline: 9.8 Gbps

Service and ClusterIP Mechanics

Kubernetes Service provides a virtual IP (ClusterIP) that load‑balances traffic to backend pods. kube‑proxy implements this via two modes:

iptables mode : creates multiple DNAT rules; linear lookup becomes a latency bottleneck when Services > 500.

IPVS mode : uses a hash table for O(1) lookup, offering stable performance at large scale.

# Switch kube‑proxy to IPVS mode
kubectl edit configmap -n kube-system kube-proxy
# change "mode: """ to "mode: ipvs"
# Verify IPVS rules
ipvsadm -L -n

IPVS supports scheduling algorithms such as round‑robin, weighted round‑robin, least‑connection, and weighted‑least‑connection. For long‑lived connections (e.g., gRPC), least_conn is recommended.

ClusterIP DNS Resolution

Pods resolve Service names via CoreDNS. The typical resolution flow is:

Pod → glibc nsswitch → nscd (cache) → CoreDNS → Service ClusterIP → iptables/IPVS DNAT → Pod IP

CoreDNS runs as a Deployment in kube-system and watches Service/Endpoint objects to keep DNS records up‑to‑date.

# CoreDNS ConfigMap (simplified)
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
      errors
      health
      ready
      kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods verified
        fallthrough in-addr.arpa ip6.arpa
      }
      prometheus :9153
      forward . 10.112.0.1   # upstream DNS
      cache 30
      loop
      reload
      loadbalance
    }

Ingress and NodePort

Ingress is a L7 entry point defined by an API object; the actual proxy is provided by an Ingress Controller (e.g., Nginx, Traefik, cloud ALB). Nginx Controller reloads configuration with nginx -s reload to avoid connection drops.

# Example NodePort Service
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - name: http
    port: 80   # ClusterIP port
    targetPort: 8080
    nodePort: 30080
  - name: grpc
    port: 9090
    targetPort: 9090
    nodePort: 30090

Setting hostNetwork: true lets a pod use the host’s network namespace directly, which eliminates CNI overhead but reduces isolation and risks port conflicts.

NetworkPolicy Practice

Basic namespace isolation can be achieved with a default‑deny policy, then selectively allow traffic based on pod labels.

# Default deny all ingress in namespace "production"
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# Allow same‑namespace pods with label role=backend to talk to each other
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: production
spec:
  podSelector:
    matchLabels:
      role: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: backend

Calico’s GlobalNetworkPolicy can enforce cross‑namespace rules, such as allowing a monitoring namespace to scrape Prometheus endpoints.

# GlobalNetworkPolicy – allow monitoring namespace to access Prometheus ports
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: allow-prometheus-scraping
spec:
  namespaceSelector: has(projectcalico.org/name)
  order: 50
  ingress:
  - action: Allow
    protocol: TCP
    destination:
      ports: [9090, 9100]
    source:
      namespaceSelector: name == "monitoring"
      selector: app == "prometheus"
  egress:
  - action: Allow

DNS Troubleshooting

Common DNS problems include mis‑configured ndots, CoreDNS resource exhaustion, or unreachable upstream DNS. The ndots option controls how many label components trigger search‑path expansion; setting it too high forces many unnecessary lookups.

# Typical ndots troubleshooting steps
kubectl exec -it pod-test -- /bin/sh
# Test DNS resolution
nslookup kubernetes.default
# Verify /etc/resolv.conf
cat /etc/resolv.conf
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
# Adjust ndots via pod spec
spec:
  dnsPolicy: ClusterFirstWithHostNet
  dnsConfig:
    nameservers: [10.96.0.10]
    searches: [production.svc.cluster.local, svc.cluster.local, cluster.local]
    options:
    - name: ndots
      value: "2"
    - name: timeout
      value: "2"
    - name: attempts
      value: "2"

Cross‑Node Communication Failure Cases

Case 1 – Calico BGP Neighbor Failure

Symptoms: pods can communicate on the same node but not across nodes. Root cause often is a firewall blocking TCP 179 (BGP). Fix by opening the port and ensuring net.ipv4.ip_forward=1 on all nodes.

# Step 1: Verify Calico node status
calicoctl node status
# Expect BGP state "Established"
# Step 2: Check routing table for pod subnets
ip route | grep 10.244
# Step 3: Inspect BIRD logs for BGP errors
kubectl logs -n calico-system -l k8s-app=calico-node --tail=50 | grep -i bgp
# Step 4: Ping remote node’s pod subnet gateway
ping -I 10.244.0.1 10.244.2.1

Case 2 – Flannel VXLAN Packet Loss

Symptoms: intermittent packet loss and high latency on cross‑node traffic. The typical cause is MTU mismatch when the physical network uses jumbo frames (e.g., 9000 bytes) but the VXLAN device keeps the default 1500 bytes. Adjust the VXLAN MTU to physical‑MTU ‑ 50 (e.g., 1450) to prevent fragmentation.

# Verify VXLAN device status
ip -d link show flannel.1
# Check FDB entries
bridge fdb show | grep flannel.1
# Adjust MTU if needed
ip link set flannel.1 mtu 1450
# Monitor error counters
cat /sys/class/net/flannel.1/statistics/rx_errors
cat /sys/class/net/flannel.1/statistics/tx_dropped

Cilium eBPF Enhancements

Kube‑proxy Replacement

Cilium can fully replace kube‑proxy; Service load‑balancing runs in eBPF programs, eliminating iptables/IPVS rules.

# Verify kube‑proxy replacement status
kubectl exec -it -n kube-system ds/cilium -- cilium-dbg status | grep KubeProxyReplacement
# Expected output includes "Kube‑proxy replacement: enabled"
# Confirm iptables no longer contain KUBE‑SERVICES chain
iptables -t nat -L | grep KUBE

Hubble Observability

Hubble provides built‑in L7 visibility, showing real‑time flow logs and service dependency graphs without external tracing systems.

# Enable Hubble UI
cilium hubble enable --ui
# Observe traffic from api‑server to order service
cilium hubble observe --from-label app=api-server
# Sample output:
# TIMESTAMP   SOURCE                DESTINATION   TYPE    VERDICT
# 10:23:45    api-server:8080       order-svc:80  HTTP/GET  FORWARDED
# 10:23:46    order-svc:80          mysql:3306    L4/TCP   FORWARDED
# 10:23:47    api-server:8080       redis:6379    HTTP/GET  DENIED

Conclusion

The article walks through Kubernetes networking fundamentals, evaluates Flannel, Calico, and Cilium for different operational goals, and presents concrete troubleshooting procedures for pod connectivity, Service routing, DNS resolution, and eBPF‑based enhancements. The evidence shows that Calico’s BGP routing delivers near‑bare‑metal performance, while Cilium’s eBPF implementation scales to tens of thousands of Services with microsecond‑level latency.

KuberneteseBPFCNICalicoflannelCiliumNetworkPolicy
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.