Cloud Native 29 min read

How to Diagnose and Fix Kubernetes Pod Network Issues: Tools, Models, and Real Cases

This article introduces a systematic approach for troubleshooting Kubernetes cluster network anomalies, covering common failure models, essential diagnostic tools such as tcpdump, nsenter, paping and mtr, and detailed case studies that illustrate step‑by‑step resolution of pod connectivity problems.

Open Source Linux

May 16, 2023

How to Diagnose and Fix Kubernetes Pod Network Issues: Tools, Models, and Real Cases

1. Pod Network Anomalies

Network anomalies can be roughly divided into the following categories:

Network unreachable : ping fails, caused by firewall rules, incorrect routing, high system load, or link failures.

Port unreachable : ping works but telnet to the port fails, caused by firewall, high load, or the application not listening.

DNS resolution failure : domain name cannot be resolved while IP connectivity works, caused by incorrect pod DNS configuration, DNS service issues, or communication problems with the DNS service.

Large packet loss : small packets succeed but large packets are dropped, often due to MTU mismatches; you can test with ping -s.

CNI abnormality : node can communicate but pods cannot reach cluster addresses, possibly due to kube-proxy failures, CIDR exhaustion, or other CNI plugin issues.

The overall classification is illustrated in the diagram below:

2. Common Network Diagnostic Tools

tcpdump

tcpdump is a powerful command‑line packet sniffer that can capture and display network traffic.

Installation:

Ubuntu/Debian: apt-get install -y tcpdump CentOS/Fedora: yum install -y tcpdump Alpine: apk add tcpdump --no-cache Typical commands:

tcpdump -D

tcpdump host 1.1.1.1

tcpdump src|dst 1.1.1.1

tcpdump net 1.2.3.0/24

tcpdump -c 1 -X icmp

tcpdump port 3389 -w capture_file

tcpdump -i eth0 -nn host 220.181.57.216 and 10.0.0.1

tcpdump -ttnnvvS -i eth0

tcpdump -nnvvS src 10.5.2.3 and dst port 3389

tcpdump dst 192.168.0.2 and src net and not icmp

tcpdump -vv src mars and not dst port 22

tcpdump 'tcp[13] & 4!=0'

tcpdump 'tcp[tcpflags] == tcp-rst'

tcpdump 'tcp[13] & 2!=0'

tcpdump 'tcp[tcpflags] == tcp-syn'

tcpdump 'tcp[13]=18'

tcpdump 'tcp[13] & 32!=0'

tcpdump 'tcp[tcpflags] == tcp-urg'

tcpdump 'tcp[13] & 16!=0'

tcpdump 'tcp[tcpflags] == tcp-ack'

tcpdump 'tcp[13] & 8!=0'

tcpdump 'tcp[tcpflags] == tcp-push'

tcpdump 'tcp[13] & 1!=0'

tcpdump 'tcp[tcpflags] == tcp-fin'

tcpdump -vvAls0 | grep 'User-Agent:'

tcpdump -vvAls0 | grep 'GET'

tcpdump -vvAls0 | grep 'Host:'

tcpdump -vvAls0 | grep 'Set-Cookie|Host:|Cookie:'

tcpdump -vvAs0 port 53

tcpdump port http or port ftp or port smtp or port imap or port pop3 or port telnet -lA | egrep -i 'pass=|pwd=|log=|login=|user=|username=|pw=|passw=|passwd=|password=|pass:|user:|username:|password:|login:'

nsenter

nsenter allows you to enter the network namespace of a container. Example usage: nsenter -t 30858 -n ifconfig To find the pod's namespace:

# Get pod node

kubectl get pods -owide | awk '{print $1,$7}'

# Get container ID

docker ps | grep <pod-name>

# Get PID of container

docker inspect --format "{{ .State.Pid }}" <container-id>

paping

paping continuously pings a target address on a specified TCP port, useful for testing port connectivity and packet loss.

paping -h

paping -p 80 -c 10 -t 1000 example.com

mtr

mtr combines traceroute and ping, providing loss percentage, latency statistics, and more.

mtr google.com

mtr -n google.com

mtr -b google.com

mtr -c 5 google.com

mtr -r -c 5 google.com > result.txt

Key options:

-n: show IP only

-b: show both IP and hostname

-c N: stop after N pings

-r: generate report

-i: specify protocol (default ICMP)

-m MAX_HOPS: limit hops

-s SIZE: set packet size

3. Pod Network Troubleshooting Process

The troubleshooting flow is illustrated below:

4. Case Studies

Node Expansion Causing Service Inaccessibility

After adding a new work node, the ClusterIP of a registry service became unreachable from that node while other nodes worked fine.

Analysis:

CNI plugin was healthy (all pod‑to‑pod communication worked).

Registry pod itself was reachable via its Pod IP.

Service IP was likely the problem on the new node.

Investigation steps included checking kube‑proxy status, iptables NAT rules, and routing tables. Packet captures showed that the source IP of encapsulated VXLAN packets was incorrect (10.153.204.228 instead of the node's actual IP 10.153.204.15). The node had both a static IP and a DHCP‑assigned IP, causing an IP conflict.

Resolution: Remove the DHCP configuration (set BOOTPROTO="none") and restart Docker and kubelet.

External Cloud Host Timing Out When Calling Cluster Service

A cloud host could telnet to a NodePort service but HTTP POST requests timed out.

Packet capture revealed that after the TCP three‑way handshake, a large 1514‑byte packet was repeatedly retransmitted without ACK, indicating an MTU mismatch between the host (MTU 1500) and the Calico tunnel interface (MTU 1440).

Resolution: Align MTU values by setting the host interface MTU to 1440 or adjusting Calico’s MTU to 1500.

Pod Accessing Object Storage Timing Out

Pods could reach the storage IP directly but DNS resolution of the storage domain failed, leading to timeouts.

Investigation showed that kube‑proxy pods on newly added nodes were pending due to insufficient priority, causing DNS service unavailability for those pods.

Resolution: Assign system-node-critical priority class to kube‑proxy and add readiness probes to ensure DNS is functional before scheduling workloads.

Tips: For more network tool usage, refer to the referenced article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes network troubleshooting tcpdump mtr nsenter

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.