Master Kubernetes Troubleshooting: Fix Common Pod, Service, and Ingress Issues
This guide walks you through a systematic, top‑to‑bottom troubleshooting flow for Kubernetes, covering pod pending problems, container start failures, readiness checks, service misconfigurations, ingress routing errors, and storage pitfalls, with concrete kubectl commands and practical fixes.
1. Pod Issue Troubleshooting
Check the overall pod status with kubectl get pods -o wide. If a pod remains Pending , the scheduler has not assigned it to any node. Investigate node health first:
List nodes: kubectl get nodes Look for NotReady conditions. Typical reasons include:
Kubelet stopped or crashed
Container‑runtime failure
Disk full on the node
CNI network plugin failure
If nodes are healthy, examine the pod events for resource constraints: kubectl describe pod <pod-name> Insufficient CPU , Insufficient Memory or ephemeral‑storage messages indicate that the pod requests more resources than are available. Resolve by:
Freeing resources on the node
Reducing the pod requests and limits Scaling the node pool or adding new nodes
Check for ResourceQuota limits in the namespace: kubectl describe quota -n <namespace> Increase the quota if it blocks the pod.
PersistentVolumeClaim (PVC) binding failures:
kubectl get pvc
kubectl describe pvc <pvc-name>Verify that a matching PersistentVolume exists, the StorageClass is valid, zone/region matches the node, and any nodeAffinity rules are satisfied.
2. Container Startup Failure
When a pod is scheduled but never reaches Running , run kubectl describe pod <pod> to see the state. The most common states are ImagePullBackOff and ErrImagePull .
Image pull problems
Incorrect image tag or digest
Wrong registry URL (e.g., missing https://)
Missing imagePullSecret for a private registry
Node cannot reach the registry (network/firewall issue)
Fix the image reference in the pod spec or create the required secret:
kubectl create secret docker-registry my-reg-secret \
--docker-server=<registry> \
--docker-username=<user> \
--docker-password=<pass> \
--docker-email=<email>Application crashes
Inspect container logs to identify runtime errors:
kubectl logs <pod>
kubectl logs <pod> --previousTypical causes include null‑pointer exceptions, database connection failures, port conflicts, missing configuration files, or failing readiness/liveness probes.
Dockerfile pitfalls
Wrong CMD or ENTRYPOINT syntax
Entrypoint script lacks execute permission ( chmod +x)
Missing shebang line ( #!/bin/sh or #!/usr/bin/env python)
In multi‑stage builds, required files not copied to the final stage
3. Application Readiness Failure
If a pod is Running but Ready=false , the issue is usually with readiness/liveness probes.
Probe misconfiguration
Incorrect httpGet.path (e.g., /health vs /healthz)
Wrong port number
Thresholds ( initialDelaySeconds, periodSeconds, failureThreshold) set too low, causing rapid restarts
Validate the probe locally:
kubectl exec -it <pod> -- curl -s localhost:<port><path>Confirm that the container actually listens on the expected port. The containerPort in the pod spec must match the Service targetPort.
Use port‑forwarding to test service reachability:
kubectl port-forward <pod> 8888:<containerPort>If the endpoint works via port‑forward but not through the Service, the problem lies in the Service definition.
4. Service Issues
Common Service misconfigurations:
Selector mismatch : The Service selector does not match any pod labels, resulting in Endpoints: <none>. Verify with:
kubectl describe svc <svc-name>
kubectl get pods --show-labelsAdjust either the Service selector or the pod labels.
Port mapping inconsistency : Ensure the three fields line up: Service.port – the port exposed to clients Service.targetPort – the port on the pod containerPort – the port the container actually listens on
Mismatches cause traffic to be dropped.
Kube‑proxy or CNI failure : Rare but impactful. Diagnose with:
systemctl status kube-proxy
kubectl get pods -A | grep kube-proxy
iptables -t nat -L
ipvsadm -Ln5. Ingress Problems
When an Ingress works with kubectl port-forward but not via its DNS name, check the following:
Host, path, and servicePort values are correct in the Ingress spec.
The Ingress controller pods are running, e.g.: kubectl get pods -n ingress-nginx External factors: DNS resolution, LoadBalancer provisioning, or firewall rules blocking the advertised port.
Inspect the Ingress resource:
kubectl describe ingress <ingress-name>6. Storage Issues
PVC Pending (binding failure)
No suitable PersistentVolume (PV) exists.
Invalid or missing StorageClass provisioner.
Static PV cannot be bound (e.g., already claimed).
Zone/region mismatch between PV and node.
Commands to investigate:
kubectl get pvc
kubectl describe pvc <pvc-name>FailedMount
Pod events show messages such as MountVolume.WaitForAttach failed or Could not mount device.
Root causes include:
Disk already attached to another node
Insufficient permissions on the storage backend
Unreachable Ceph/NFS server
Missing secret for encrypted volumes (e.g., Ceph RBD)
File‑system corruption
kubectl exec -it <pod> -- df -h
kubectl exec -it <pod> -- ls /mnt/pathCorrupted mounts will cause the container to crash or stay in CrashLoopBackOff.
Ephemeral‑storage exhaustion
Node reports evicted: ephemeral storage in kubectl describe node <node>.
Mitigation steps:
Expand the node’s disk size.
Clean up /var/lib/docker or containerd data.
Adjust the pod’s
ephemeral-storage requestsand limits.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
