How to Diagnose and Fix Kubernetes Pod Network Issues: Tools, Models, and Real Cases
This article introduces a systematic approach for troubleshooting Kubernetes cluster network anomalies, covering common failure models, essential diagnostic tools such as tcpdump, nsenter, paping and mtr, and detailed case studies that illustrate step‑by‑step resolution of pod connectivity problems.
1. Pod Network Anomalies
Network anomalies can be roughly divided into the following categories:
Network unreachable : ping fails, caused by firewall rules, incorrect routing, high system load, or link failures.
Port unreachable : ping works but telnet to the port fails, caused by firewall, high load, or the application not listening.
DNS resolution failure : domain name cannot be resolved while IP connectivity works, caused by incorrect pod DNS configuration, DNS service issues, or communication problems with the DNS service.
Large packet loss : small packets succeed but large packets are dropped, often due to MTU mismatches; you can test with ping -s.
CNI abnormality : node can communicate but pods cannot reach cluster addresses, possibly due to kube-proxy failures, CIDR exhaustion, or other CNI plugin issues.
The overall classification is illustrated in the diagram below:
2. Common Network Diagnostic Tools
tcpdump
tcpdump is a powerful command‑line packet sniffer that can capture and display network traffic.
Installation:
Ubuntu/Debian: apt-get install -y tcpdump CentOS/Fedora: yum install -y tcpdump Alpine: apk add tcpdump --no-cache Typical commands:
tcpdump -D tcpdump host 1.1.1.1 tcpdump src|dst 1.1.1.1 tcpdump net 1.2.3.0/24 tcpdump -c 1 -X icmp tcpdump port 3389 -w capture_file tcpdump -i eth0 -nn host 220.181.57.216 and 10.0.0.1 tcpdump -ttnnvvS -i eth0 tcpdump -nnvvS src 10.5.2.3 and dst port 3389 tcpdump dst 192.168.0.2 and src net and not icmp tcpdump -vv src mars and not dst port 22 tcpdump 'tcp[13] & 4!=0' tcpdump 'tcp[tcpflags] == tcp-rst' tcpdump 'tcp[13] & 2!=0' tcpdump 'tcp[tcpflags] == tcp-syn' tcpdump 'tcp[13]=18' tcpdump 'tcp[13] & 32!=0' tcpdump 'tcp[tcpflags] == tcp-urg' tcpdump 'tcp[13] & 16!=0' tcpdump 'tcp[tcpflags] == tcp-ack' tcpdump 'tcp[13] & 8!=0' tcpdump 'tcp[tcpflags] == tcp-push' tcpdump 'tcp[13] & 1!=0' tcpdump 'tcp[tcpflags] == tcp-fin' tcpdump -vvAls0 | grep 'User-Agent:' tcpdump -vvAls0 | grep 'GET' tcpdump -vvAls0 | grep 'Host:' tcpdump -vvAls0 | grep 'Set-Cookie|Host:|Cookie:' tcpdump -vvAs0 port 53 tcpdump port http or port ftp or port smtp or port imap or port pop3 or port telnet -lA | egrep -i 'pass=|pwd=|log=|login=|user=|username=|pw=|passw=|passwd=|password=|pass:|user:|username:|password:|login:'nsenter
nsenter allows you to enter the network namespace of a container. Example usage: nsenter -t 30858 -n ifconfig To find the pod's namespace:
# Get pod node kubectl get pods -owide | awk '{print $1,$7}' # Get container ID docker ps | grep <pod-name> # Get PID of container docker inspect --format "{{ .State.Pid }}" <container-id>paping
paping continuously pings a target address on a specified TCP port, useful for testing port connectivity and packet loss.
paping -h paping -p 80 -c 10 -t 1000 example.commtr
mtr combines traceroute and ping, providing loss percentage, latency statistics, and more.
mtr google.com mtr -n google.com mtr -b google.com mtr -c 5 google.com mtr -r -c 5 google.com > result.txtKey options:
-n: show IP only
-b: show both IP and hostname
-c N: stop after N pings
-r: generate report
-i: specify protocol (default ICMP)
-m MAX_HOPS: limit hops
-s SIZE: set packet size
3. Pod Network Troubleshooting Process
The troubleshooting flow is illustrated below:
4. Case Studies
Node Expansion Causing Service Inaccessibility
After adding a new work node, the ClusterIP of a registry service became unreachable from that node while other nodes worked fine.
Analysis:
CNI plugin was healthy (all pod‑to‑pod communication worked).
Registry pod itself was reachable via its Pod IP.
Service IP was likely the problem on the new node.
Investigation steps included checking kube‑proxy status, iptables NAT rules, and routing tables. Packet captures showed that the source IP of encapsulated VXLAN packets was incorrect (10.153.204.228 instead of the node's actual IP 10.153.204.15). The node had both a static IP and a DHCP‑assigned IP, causing an IP conflict.
Resolution: Remove the DHCP configuration (set BOOTPROTO="none") and restart Docker and kubelet.
External Cloud Host Timing Out When Calling Cluster Service
A cloud host could telnet to a NodePort service but HTTP POST requests timed out.
Packet capture revealed that after the TCP three‑way handshake, a large 1514‑byte packet was repeatedly retransmitted without ACK, indicating an MTU mismatch between the host (MTU 1500) and the Calico tunnel interface (MTU 1440).
Resolution: Align MTU values by setting the host interface MTU to 1440 or adjusting Calico’s MTU to 1500.
Pod Accessing Object Storage Timing Out
Pods could reach the storage IP directly but DNS resolution of the storage domain failed, leading to timeouts.
Investigation showed that kube‑proxy pods on newly added nodes were pending due to insufficient priority, causing DNS service unavailability for those pods.
Resolution: Assign system-node-critical priority class to kube‑proxy and add readiness probes to ensure DNS is functional before scheduling workloads.
Tips: For more network tool usage, refer to the referenced article.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
