Cloud Native 27 min read

How to Diagnose Kubernetes Pod Network Failures: Models, Tools, and Real Cases

This article introduces a systematic approach to troubleshooting Kubernetes pod network issues, covering common failure types, essential diagnostic tools like tcpdump, nsenter, paping, and mtr, and detailed case studies that illustrate step‑by‑step analysis and resolution of real‑world connectivity problems.

Open Source Linux

Sep 14, 2022

How to Diagnose Kubernetes Pod Network Failures: Models, Tools, and Real Cases

Pod Network Anomalies

The article classifies pod network problems into categories such as unreachable network (ping fails), unreachable port (telnet fails), DNS resolution errors, and large packet loss, each with possible causes like firewall rules, routing misconfigurations, high system load, or MTU mismatches.

Common Diagnostic Tools

tcpdump

tcpdump captures packets on interfaces and can filter by host, port, protocol, etc.

tcpdump -i eth0 -nn host 220.181.57.216 and 10.0.0.1

nsenter

nsenter allows entering a container's network namespace to run commands like ifconfig or netstat when the container lacks those utilities.

nsenter -t 30858 -n ifconfig

paping

paping continuously pings a TCP port to test connectivity and packet loss.

paping -p 80 -c 10 example.com

mtr

mtr combines traceroute and ping, showing loss percentage and latency per hop.

mtr -n google.com

Pod Network Troubleshooting Process

The workflow starts with confirming pod‑to‑pod communication, then checking service IP reachability, followed by inspecting CNI plugins, kube‑proxy rules, and finally capturing packets on relevant interfaces (veth, cni0, flannel) to pinpoint the failing node.

Case Study 1: Service Unreachable After Node Expansion

A newly added node could not reach a ClusterIP service (10.233.0.100:5000) while other nodes could. Investigation showed correct CNI and kube‑proxy status, but packet captures revealed mismatched source IPs due to the node having both static and DHCP addresses, causing IP conflict. The fix was to remove the duplicate DHCP configuration and restart Docker and kubelet.

Case Study 2: External Host Timeout to NodePort Service

An external VM could telnet to the NodePort but HTTP requests timed out. Wireshark showed successful TCP handshake but large packets (>1400 bytes) were repeatedly retransmitted. The MTU mismatch between the VM (1500) and the Calico tunnel (1440) caused fragmentation issues. Adjusting the MTU on the VM or Calico resolved the problem.

Case Study 3: Pod DNS Failure Accessing Object Storage

Pods could reach the object‑storage IP but failed to resolve its domain name. DNS queries to the cluster DNS succeeded, but upstream DNS lookups timed out. The root cause was that kube‑proxy pods on newly added nodes were being evicted due to missing priority class, breaking service DNS for those nodes. Assigning system-node-critical priority to kube‑proxy restored DNS functionality.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

network troubleshooting MTU tcpdump mtr nsenter

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.