Cloud Native 28 min read

How to Diagnose Kubernetes Pod Network Issues: Tools, Models, and Real‑World Cases

This article introduces a systematic approach for troubleshooting Kubernetes pod network problems, covering common failure models, essential diagnostic tools such as tcpdump, nsenter, paping and mtr, and detailed case studies that illustrate step‑by‑step analysis and resolution techniques.

Efficient Ops

Jul 9, 2023

How to Diagnose Kubernetes Pod Network Issues: Tools, Models, and Real‑World Cases

1. Pod Network Anomalies

Network anomalies can be classified into several categories:

Network unreachable – ping fails, caused by firewall rules, incorrect routing, high system load, or link failures.

Port unreachable – ping works but telnet fails, caused by firewall restrictions, high load, or the application not listening.

DNS resolution failure – domain names cannot be resolved while IP connectivity works, caused by incorrect pod DNS settings, DNS service issues, or communication problems with the DNS service.

Large packet loss – small packets succeed but large packets are dropped, often due to MTU mismatches; you can test with ping -s.

CNI plugin issues – node can communicate but pods cannot reach cluster addresses, often due to kube‑proxy or CIDR exhaustion.

The overall classification is illustrated in the following diagram:

In summary, the most common pod network failures are network unreachable, port unreachable, DNS resolution errors, and large‑packet loss.

2. Common Network Diagnostic Tools

tcpdump

tcpdump is a powerful packet capture tool. Installation commands:

Ubuntu/Debian: apt-get install -y tcpdump CentOS/Fedora: yum install -y tcpdump Alpine: apk add tcpdump --no-cache Typical usage examples:

tcpdump -D

tcpdump host 1.1.1.1

tcpdump src|dst 1.1.1.1

tcpdump net 1.2.3.0/24

tcpdump -c 1 -X icmp

tcpdump port 3389

tcpdump portrange 21-23

tcpdump less 32

tcpdump greater 64

tcpdump -w capture_file

Logical operators can be combined, e.g.:

tcpdump -i eth0 -nn host 220.181.57.216 and 10.0.0.1

tcpdump -i eth0 -nn host 220.181.57.216 or 10.0.0.1

tcpdump -i eth0 -nn host 10.0.0.1 and (10.0.0.9 or 10.0.0.3)

TCP flag filters (RST, SYN, ACK, etc.) are also supported.

nsenter

nsenter allows entering a process’s network namespace. Example syntax: nsenter -t <pid> -n <command> To inspect a pod’s network from the host:

# Find the pod’s PID

ps -ef|grep tail

# Enter the namespace

nsenter -t 30858 -n ifconfig

paping

paping continuously pings a target TCP port, useful for testing connectivity and packet loss. paping -p 80 -c 10 example.com Installation dependencies vary by OS (e.g., libstdc++.i686 on RHEL/CentOS).

mtr

mtr combines traceroute and ping, providing loss percentage, latency statistics, and more.

mtr google.com

mtr -n google.com

mtr -b google.com

mtr -c 5 google.com

Key columns: last, avg, best, worst, stdev. Loss% > 0 indicates possible issues; high stdev suggests unstable latency.

Tips: For more network tools, refer to additional resources.

3. Pod Network Troubleshooting Workflow

The troubleshooting process follows the diagram below:

Pod network troubleshooting idea

4. Case Studies

Node Expansion – Service Unreachable

After adding a new worker node, the node could not reach the ClusterIP of a registry service, while other nodes worked fine.

Investigation steps:

Verified CNI plugin (flannel vxlan) and kube‑proxy (iptables) were functioning.

Confirmed the registry pod itself was reachable.

Checked iptables NAT rules – they were correct.

Examined routing tables; the problematic node had two IP addresses on the same NIC (static + DHCP), causing IP conflict.

Resolution: Removed the DHCP configuration (set BOOTPROTO="none"), then restarted Docker and kubelet.

External Cloud Host – Timeout

A cloud VM could telnet to a NodePort service but HTTP POST requests timed out.

Analysis revealed large packets (>1400 bytes) were repeatedly retransmitted due to MTU mismatch (host MTU 1500 vs. Calico tunnel MTU 1440).

Fix: Align MTU values by setting the host NIC to 1440 or adjusting Calico’s MTU to 1500.

Pod Accessing Object Storage – DNS Timeout

Pods could reach the storage IP but failed DNS resolution for the storage domain.

Root cause: kube‑proxy pods on newly added nodes were pending because they lacked the highest priority class; when resources were scarce, kube‑proxy was evicted, breaking service/DNS access.

Solution: Assign system-node-critical priority class to kube‑proxy and add readiness probes for dependent pods.

Source: https://www.cnblogs.com/Cylon/p/16611503.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes network troubleshooting iptables CNI pod tcpdump nsenter

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.