Operations 9 min read

Essential Kubernetes Troubleshooting Guide: Diagnose POD Failures and DNS Issues

This guide walks you through ten practical steps for diagnosing Kubernetes problems, from POD startup failures and resource limits to network connectivity, storage configuration, container logs, DNS resolution, and final troubleshooting tips, helping you keep your clusters stable and reliable.

Open Source Linux
Open Source Linux
Open Source Linux
Essential Kubernetes Troubleshooting Guide: Diagnose POD Failures and DNS Issues

Kubernetes Troubleshooting Overview

This article outlines a systematic approach to identify and resolve common issues in a Kubernetes (K8s) cluster.

1. POD Startup Failures

PODs are the smallest scheduling unit in K8s; containers inside a POD share the same network, storage, and resources. Abnormal POD behavior can stem from:

Resource exhaustion when many PODs run on a single node, causing node crashes.

Memory or CPU overuse due to application leaks; set resource limits after load testing.

Network problems preventing POD communication; check the Calico plugin.

Storage issues where mounted volumes are unavailable.

Application code errors that cause container start failures.

Misconfigured deployment or StatefulSet manifests.

Use monitoring tools to detect these problems.

2. Inspect Cluster State

Start by checking node health with kubectl get nodes. Ensure core components (etcd, kubelet, kube-proxy) are running and all nodes are Ready.

3. Review Event Logs

Run kubectl get events to see recent cluster events and errors, which help pinpoint failing components.

4. Focus on POD Status

List all PODs across namespaces: kubectl get pods --all-namespaces. For non‑Running PODs, use kubectl describe pod <pod-name> to get detailed information.

5. Check Network Connectivity

Verify service, POD, and node communication. Use kubectl get services and kubectl describe service <svc-name>. Review network policies and firewall rules.

6. Examine Storage Configuration

If your application uses Persistent Volumes (PV) or StorageClasses, check their status with kubectl get pv, kubectl get pvc, and kubectl get storageclass.

7. Analyze Container Logs

Inspect logs with kubectl logs <pod-name>. For multi‑container PODs, specify the container: kubectl logs <pod-name> -c <container-name>.

8. Understand K8s Cluster Networking

K8s relies on a network plugin (e.g., Calico, Flannel). Common communication patterns include:

Container‑to‑container within the same POD.

POD‑to‑POD communication.

POD‑to‑Service communication.

Service‑to‑external traffic.

9. Verify Service DNS Resolution

Test DNS from a POD in the same namespace:

u@pod$ nslookup hostnames
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

If it fails, try a fully qualified name:

u@pod$ nslookup hostnames.default.svc.cluster.local

Ensure /etc/resolv.conf contains the correct nameserver and search suffixes (e.g., default.svc.cluster.local, svc.cluster.local, cluster.local).

10. Summary

The exact troubleshooting steps depend on your cluster setup and the symptoms observed. By following the above checklist—examining node health, events, POD status, network, storage, logs, and DNS—you can more effectively diagnose and resolve Kubernetes issues, keeping your applications stable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesnetworktroubleshootingstorageDNSPodlogs
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.