Operations 11 min read

Master Kubernetes Troubleshooting: Fix Common Pod, Service, and Ingress Issues

This guide walks you through a systematic, top‑to‑bottom troubleshooting flow for Kubernetes, covering pod pending problems, container start failures, readiness checks, service misconfigurations, ingress routing errors, and storage pitfalls, with concrete kubectl commands and practical fixes.

dbaplus Community
dbaplus Community
dbaplus Community
Master Kubernetes Troubleshooting: Fix Common Pod, Service, and Ingress Issues

1. Pod Issue Troubleshooting

Check the overall pod status with kubectl get pods -o wide. If a pod remains Pending , the scheduler has not assigned it to any node. Investigate node health first:

List nodes: kubectl get nodes Look for NotReady conditions. Typical reasons include:

Kubelet stopped or crashed

Container‑runtime failure

Disk full on the node

CNI network plugin failure

If nodes are healthy, examine the pod events for resource constraints: kubectl describe pod <pod-name> Insufficient CPU , Insufficient Memory or ephemeral‑storage messages indicate that the pod requests more resources than are available. Resolve by:

Freeing resources on the node

Reducing the pod requests and limits Scaling the node pool or adding new nodes

Check for ResourceQuota limits in the namespace: kubectl describe quota -n <namespace> Increase the quota if it blocks the pod.

PersistentVolumeClaim (PVC) binding failures:

kubectl get pvc
kubectl describe pvc <pvc-name>

Verify that a matching PersistentVolume exists, the StorageClass is valid, zone/region matches the node, and any nodeAffinity rules are satisfied.

2. Container Startup Failure

When a pod is scheduled but never reaches Running , run kubectl describe pod <pod> to see the state. The most common states are ImagePullBackOff and ErrImagePull .

Image pull problems

Incorrect image tag or digest

Wrong registry URL (e.g., missing https://)

Missing imagePullSecret for a private registry

Node cannot reach the registry (network/firewall issue)

Fix the image reference in the pod spec or create the required secret:

kubectl create secret docker-registry my-reg-secret \
  --docker-server=<registry> \
  --docker-username=<user> \
  --docker-password=<pass> \
  --docker-email=<email>

Application crashes

Inspect container logs to identify runtime errors:

kubectl logs <pod>
kubectl logs <pod> --previous

Typical causes include null‑pointer exceptions, database connection failures, port conflicts, missing configuration files, or failing readiness/liveness probes.

Dockerfile pitfalls

Wrong CMD or ENTRYPOINT syntax

Entrypoint script lacks execute permission ( chmod +x)

Missing shebang line ( #!/bin/sh or #!/usr/bin/env python)

In multi‑stage builds, required files not copied to the final stage

3. Application Readiness Failure

If a pod is Running but Ready=false , the issue is usually with readiness/liveness probes.

Probe misconfiguration

Incorrect httpGet.path (e.g., /health vs /healthz)

Wrong port number

Thresholds ( initialDelaySeconds, periodSeconds, failureThreshold) set too low, causing rapid restarts

Validate the probe locally:

kubectl exec -it <pod> -- curl -s localhost:<port><path>

Confirm that the container actually listens on the expected port. The containerPort in the pod spec must match the Service targetPort.

Use port‑forwarding to test service reachability:

kubectl port-forward <pod> 8888:<containerPort>

If the endpoint works via port‑forward but not through the Service, the problem lies in the Service definition.

4. Service Issues

Common Service misconfigurations:

Selector mismatch : The Service selector does not match any pod labels, resulting in Endpoints: <none>. Verify with:

kubectl describe svc <svc-name>
kubectl get pods --show-labels

Adjust either the Service selector or the pod labels.

Port mapping inconsistency : Ensure the three fields line up: Service.port – the port exposed to clients Service.targetPort – the port on the pod containerPort – the port the container actually listens on

Mismatches cause traffic to be dropped.

Kube‑proxy or CNI failure : Rare but impactful. Diagnose with:

systemctl status kube-proxy
kubectl get pods -A | grep kube-proxy
iptables -t nat -L
ipvsadm -Ln

5. Ingress Problems

When an Ingress works with kubectl port-forward but not via its DNS name, check the following:

Host, path, and servicePort values are correct in the Ingress spec.

The Ingress controller pods are running, e.g.: kubectl get pods -n ingress-nginx External factors: DNS resolution, LoadBalancer provisioning, or firewall rules blocking the advertised port.

Inspect the Ingress resource:

kubectl describe ingress <ingress-name>

6. Storage Issues

PVC Pending (binding failure)

No suitable PersistentVolume (PV) exists.

Invalid or missing StorageClass provisioner.

Static PV cannot be bound (e.g., already claimed).

Zone/region mismatch between PV and node.

Commands to investigate:

kubectl get pvc
kubectl describe pvc <pvc-name>

FailedMount

Pod events show messages such as MountVolume.WaitForAttach failed or Could not mount device.

Root causes include:

Disk already attached to another node

Insufficient permissions on the storage backend

Unreachable Ceph/NFS server

Missing secret for encrypted volumes (e.g., Ceph RBD)

File‑system corruption

kubectl exec -it <pod> -- df -h
kubectl exec -it <pod> -- ls /mnt/path

Corrupted mounts will cause the container to crash or stay in CrashLoopBackOff.

Ephemeral‑storage exhaustion

Node reports evicted: ephemeral storage in kubectl describe node <node>.

Mitigation steps:

Expand the node’s disk size.

Clean up /var/lib/docker or containerd data.

Adjust the pod’s

ephemeral-storage
requests

and limits.

Kubernetes troubleshooting flow diagram
Kubernetes troubleshooting flow diagram
TroubleshootingStorageServiceIngressPod
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.