How to Diagnose and Fix Pod Network Issues in Kubernetes Clusters
This article introduces a systematic approach to troubleshooting Kubernetes pod network anomalies, covering classification of common failures, essential tools such as tcpdump, nsenter, paping and mtr, detailed packet‑capture techniques, a step‑by‑step troubleshooting workflow, and real‑world case studies to illustrate root‑cause analysis and resolution.
Pod Network Anomaly Classification
Network Unreachable : ping fails; possible causes include firewall (iptables, selinux) restrictions, routing errors, high system load, or link failures.
Port Unreachable : ping works but telnet to the port fails; causes may be firewall blocks, exhausted ports, or the service not listening.
DNS Resolution Failure : domain name cannot be resolved while IP connectivity works; causes include incorrect pod DNS settings, DNS service issues, or broken communication with DNS.
Large Packet Loss : small packets succeed but large packets are dropped; often due to MTU mismatch (test with ping -s).
CNI Exceptions : nodes can communicate but pods cannot reach cluster addresses; possible kube‑proxy misconfiguration, exhausted PodCIDR, or other CNI plugin problems.
Common Network Troubleshooting Tools
tcpdump
Capture and analyze traffic on a specific interface. tcpdump -i eth0 -nn -w capture.pcap Key options: -i (interface), -n (numeric output), -X (hex & ASCII), logical filters ( and, or, not).
nsenter
Enter a container’s network namespace to run commands such as ifconfig directly inside the pod.
nsenter -t 30858 -n ifconfigpaping
Continuously ping a TCP port to test connectivity and packet loss, complementing ping and telnet.
paping -p 80 -c 10 example.commtr
Combine traceroute and ping, providing loss percentages and latency statistics for each hop.
mtr -c 5 -r google.comPacket Capture Guidance
Capture on both source and destination nodes (e.g., veth , docker0 , flannel interfaces) to verify NAT translation and response receipt. For VXLAN‑based CNI, also capture on the physical NIC’s VXLAN port (usually 8472).
Pod Network Troubleshooting Workflow
Identify the failure type using the classification above.
Verify basic connectivity (ping, telnet) from the pod to the target.
Check CNI plugin status and node‑level routes.
Inspect kube‑proxy iptables rules for the service IP.
Use tcpdump on relevant interfaces to trace packets and NAT behavior.
Analyze VXLAN encapsulation if using flannel or similar CNI.
Apply fixes such as correcting MTU, removing duplicate IP configurations, or adjusting firewall rules.
Case Studies
1. Service Unreachable After Node Expansion
After adding a new worker node, the node could not reach a ClusterIP service while other nodes could. Packet captures showed duplicate IP configuration on enp26s0f0 (static + DHCP), causing IP conflicts and broken VXLAN encapsulation. The issue was resolved by disabling DHCP in the interface config and restarting Docker and kubelet.
2. External Host Timeout to Cluster Service
An external VM could telnet to a NodePort service but HTTP POST timed out. Wireshark revealed that packets larger than 1400 bytes were dropped due to an MTU mismatch (host MTU 1500 vs. Calico tunnel MTU 1440). Aligning MTU values on both ends fixed the problem.
3. Pod Access to Object Storage Fails
Pods could ping the storage IP but could not resolve its DNS name. The cause was that kube‑proxy pods on newly added nodes were evicted because they lacked a priority class, leaving DNS and other services unavailable. Setting priorityClassName: system-node-critical for kube‑proxy restored DNS functionality.
References
Original article (Chinese): https://www.cnblogs.com/Cylon/p/16611503.html
paping tool archive: https://code.google.com/archive/p/paping/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
