Master Kubernetes Troubleshooting: 10 Essential Steps to Diagnose Pod Failures
This guide walks you through ten systematic steps—from inspecting pod status and cluster health to verifying network connectivity, storage configuration, DNS resolution, and container logs—to effectively troubleshoot common Kubernetes issues and keep your applications running smoothly.
1. Pod startup anomalies
Pods are the smallest scheduling unit in Kubernetes; containers inside a pod share its network, storage, and resources. Common causes of pod failures include resource exhaustion, memory/CPU leaks, network issues, storage mount problems, code errors, and mis‑configured manifests.
Resource overload: too many pods on a single node can crash the node.
Memory/CPU spikes: application leaks cause the pod to be killed; set resource limits after load testing.
Network problems: check Calico or other CNI plugins.
Storage issues: verify that shared volumes are accessible.
Code errors: container entrypoint failures.
Configuration errors: incorrect Deployment or StatefulSet specs.
Use monitoring tools to detect these problems.
2. Inspect cluster health
Run kubectl get nodes to ensure all nodes are Ready and that core components (etcd, kubelet, kube‑proxy) are operating correctly.
3. Review event logs
Use kubectl get events to list recent cluster events, which often reveal component‑level failures.
4. Focus on pod status
Execute kubectl get pods --all-namespaces to spot pods that are Pending, CrashLoopBackOff, or not Ready, then run kubectl describe pod <pod-name> for detailed information.
5. Verify network connectivity
Check service, pod, and node communication with kubectl get services and kubectl describe service. Ensure network policies and firewall rules allow the required traffic.
6. Examine storage configuration
If your workloads use PersistentVolumes or StorageClasses, confirm their status with kubectl get pv, kubectl get pvc, and kubectl get storageclass.
7. Inspect container logs
Retrieve logs using kubectl logs <pod-name>. For multi‑container pods, specify the container with kubectl logs <pod-name> -c <container-name>.
8. Understand cluster networking
Kubernetes relies on a CNI plugin (e.g., Calico, Flannel, Cilium) for intra‑cluster communication. Typical communication patterns include container‑to‑container within a pod, pod‑to‑pod, pod‑to‑service, and service‑to‑external traffic.
9. Verify Service DNS resolution
From a pod in the same namespace, run:
u@pod$ nslookup hostnames
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames
Address 1: 10.0.1.175 hostnames.default.svc.cluster.localIf it fails, try a fully qualified name:
u@pod$ nslookup hostnames.default.svc.cluster.local
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames.default.svc.cluster.local
Address 1: 10.0.1.175 hostnames.default.svc.cluster.localCheck /etc/resolv.conf to ensure the nameserver points to the cluster DNS service and that the search line includes the appropriate suffixes (e.g., default.svc.cluster.local, svc.cluster.local, cluster.local). The options ndots:5 setting should be present.
10. Summary
The exact troubleshooting steps depend on your cluster setup, deployment method, and observed symptoms. By systematically examining pod health, node status, events, networking, storage, logs, and DNS configuration, you can pinpoint most Kubernetes issues and maintain stable application operation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
