Cloud Native 10 min read

Master Kubernetes Troubleshooting: From Pod Failures to DNS Issues

This guide walks through a systematic approach to diagnosing Kubernetes problems, covering pod startup failures, cluster health checks, event logs, pod status, network and storage verification, container logs, DNS service checks, and provides practical commands and tips for each step.

dbaplus Community

Nov 16, 2023

Master Kubernetes Troubleshooting: From Pod Failures to DNS Issues

1. Pod startup failures

Pods are the smallest scheduling unit in Kubernetes; containers inside a pod share its network, storage, and resources. Common reasons for pod failures include:

Resource overallocation : Too many pods on a node can exhaust resources and cause node crashes.

Memory or CPU limits exceeded : Application memory leaks or high CPU usage can cause the pod to be killed. Mitigate by load‑testing and setting resource limits.

Network problems : Misconfigured CNI plugins (e.g., Calico) prevent pod communication.

Storage issues : Unavailable persistent volumes or mis‑mounted storage cause start‑up errors.

Code errors : Application code crashes after container start.

Configuration errors : Faulty Deployment or StatefulSet manifests prevent pod creation.

Use monitoring systems to help pinpoint the above issues.

2. Inspect cluster status

Check the overall health of the cluster with kubectl get nodes. Verify that all nodes are Ready and that core components such as etcd, kubelet, and kube‑proxy are running correctly.

3. Trace event logs

Inspect recent cluster events using kubectl get events. Event logs contain timestamps and messages about failures in components or applications, helping you locate the root cause.

4. Focus on pod status

List all pods across namespaces: kubectl get pods --all-namespaces. Identify pods that are not Running (e.g., Pending, CrashLoopBackOff). For a specific pod, run kubectl describe pod <pod-name> to see detailed status and events.

5. Check network connectivity

Verify service, pod, and node network communication. Use kubectl get services and kubectl describe service <svc-name>. Ensure network policies and firewall rules are correctly configured.

6. Review storage configuration

If your application uses Persistent Volumes (PV) or StorageClasses, confirm their status with kubectl get pv, kubectl get pvc, and kubectl get storageclass. Check that claims are bound and volumes are accessible.

7. Examine container logs

Fetch logs from a pod with kubectl logs <pod-name>. For pods with multiple containers, specify the container: kubectl logs <pod-name> -c <container-name>. Logs often reveal application‑level errors.

8. Cluster network communication

Kubernetes clusters rely on CNI plugins for internal networking. Common plugins include:

Calico – provides IP address allocation and network policy enforcement.

Flannel – simple IP address allocation.

Canel – a hybrid of Calico and Flannel.

Network communication patterns:

Between containers in the same pod.

Pod‑to‑Pod communication.

Pod‑to‑Service communication.

Service‑to‑external traffic.

9. Service DNS verification

Test DNS resolution from a pod:

u@pod$ nslookup hostnames
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

If the lookup fails, the pod and service may be in different namespaces. Try a qualified name:

u@pod$ nslookup hostnames.default
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames.default
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

Or use the fully‑qualified name:

u@pod$ nslookup hostnames.default.svc.cluster.local
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames.default.svc.cluster.local
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

Check /etc/resolv.conf to ensure the nameserver points to the cluster DNS service and that the search line contains the appropriate suffixes:

u@pod$ cat /etc/resolv.conf
nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

Adjust the DNS service IP or search domains if they differ in your cluster.

10. Summary

The exact troubleshooting steps depend on your cluster configuration, deployment method, and observed symptoms. By following the systematic checks above—examining pod health, cluster status, events, networking, storage, logs, and DNS—you can more confidently identify and resolve Kubernetes issues, keeping your applications stable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Network cluster Troubleshooting DNS pod

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.