Operations 11 min read

Master Kubernetes Troubleshooting: Common Issues and How to Fix Them

This guide walks you through the most frequent Kubernetes problems—from image pull failures and CrashLoopBackOff to DNS, storage, node readiness, and RBAC errors—providing clear diagnosis steps, essential kubectl commands, and concrete solutions to keep your clusters healthy.

Ray's Galactic Tech
Ray's Galactic Tech
Ray's Galactic Tech
Master Kubernetes Troubleshooting: Common Issues and How to Fix Them

Introduction

Two fundamental commands are the backbone of Kubernetes troubleshooting: kubectl describe (inspects resources and events) and kubectl logs (shows container output). Apply an "outside‑in, big‑to‑small" approach: Node → Pod → Container → Application.

1. Deployment & Configuration Issues

ImagePullBackOff / ErrImagePull

Symptoms : Pod status shows ImagePullBackOff or ErrImagePull.

Incorrect image name or tag.

Missing imagePullSecrets for private registries.

Network cannot reach the registry.

Image architecture mismatch.

Resolution :

Verify the image reference: kubectl describe pod <pod-name> Create a Docker registry secret (if needed) and attach it to the pod spec:

kubectl create secret docker-registry my-registry-key \
  --docker-server=<registry> \
  --docker-username=<user> \
  --docker-password=<pass> \
  --docker-email=<email>
imagePullSecrets:
- name: my-registry-key

Test pulling the image directly on the node:

docker pull <image>
# or
crictl pull <image>

CrashLoopBackOff

Symptoms : Pod repeatedly crashes, alternating between CrashLoopBackOff and Error.

Investigation & Fix :

View the previous container logs: kubectl logs <pod-name> --previous Common causes: mis‑configuration, missing dependencies, insufficient permissions, or an incorrect start command.

Deploy a temporary debug container that stays alive to inspect the environment:

command: ["/bin/sh"]
args: ["-c", "sleep 3600"]

Pending

Symptoms : Pod remains in Pending state.

Root cause : Scheduler cannot find a suitable node.

Inspect events for the pod: kubectl describe pod <pod-name> Typical reasons:

Insufficient CPU or memory – increase node capacity or lower resource requests.

Node selector, affinity or taints that do not match any node – adjust labels, affinity rules, or add tolerations.

2. Runtime Issues

Pod Running but Service Unreachable

Confirm the application is listening on the expected port (check container spec and logs).

Verify the Service selector matches the Pod labels and that targetPort is correct.

Check that Endpoints exist for the Service: kubectl get endpoints <svc> If the list is empty, no Pods match the selector.

Inspect any NetworkPolicy that might block traffic.

Debug from inside the cluster:

kubectl run debug --rm -it --image=busybox -- sh
wget <svc>.<ns>.svc.cluster.local:<port>

DNS Resolution Failures

Symptoms : Pods cannot resolve service names.

Check CoreDNS pods are healthy:

kubectl get pods -n kube-system -l k8s-app=kube-dns

Ensure /etc/resolv.conf inside the pod contains the cluster DNS server (e.g., nameserver 10.96.0.10).

Run a DNS query from a pod:

nslookup kubernetes.default.svc.cluster.local

3. Storage Issues

PVC Pending

List available StorageClasses: kubectl get storageclass Make sure the PVC's accessModes, size, and storage class match an existing PersistentVolume.

If using dynamic provisioning, verify that the provisioner pod is running and has no errors.

FailedMount

Inspect pod events for mount errors: kubectl describe pod <pod-name> Confirm the backend storage (NFS, Ceph, etc.) is reachable, the export path is correct, and required mount utilities are installed on the node.

Check permissions on the storage target.

4. Node & Cluster Issues

Node NotReady

Check kubelet status: systemctl status kubelet Verify the container runtime (docker or containerd) is active:

systemctl status docker   # or   systemctl status containerd

Inspect node resources – disk space and memory pressure:

df -h
free -h

Review kubelet logs for clues:

journalctl -u kubelet -f

5. Network & LoadBalancer Problems

NodePort Unreachable

Open the NodePort range in the host firewall (e.g., iptables or cloud security groups).

Ensure the Service has active Endpoints.

Confirm the node is listening on the port:

netstat -tunlp | grep <nodeport>

LoadBalancer Not Working

In cloud environments, verify that a LoadBalancer controller (e.g., cloud‑provider integration, MetalLB, or ingress‑nginx) is installed and configured.

6. Job & CronJob Issues

Job stuck : Pod process is hanging or looping. Examine pod logs and exit codes.

CronJob not firing : Check the cron schedule expression and inspect the CronJob controller events.

kubectl get cronjob
kubectl describe cronjob <name>

7. Security & RBAC Issues

Forbidden : Missing Role or ClusterRoleBinding. Create the appropriate binding for the service account.

Pod API access failures: Add serviceAccountName to the pod spec and bind the required permissions.

8. Resource Limits

OOMKilled : Container exceeded its memory limit. Increase resources.requests.memory and resources.limits.memory or optimise the application.

CPU throttling : CPU limit is too low. Raise resources.limits.cpu.

9. Toolbox

kubectl get <resource>

– retrieve current status. kubectl describe <resource> – detailed view with events. kubectl logs <pod> – view container logs. kubectl exec -it <pod> -- sh – open an interactive shell inside the container. kubectl get events --all-namespaces – list cluster‑wide events. kubectl top nodes / kubectl top pods – show resource usage (requires metrics‑server). kubectl debug – attach a temporary debug container to an existing pod.

10. Practical Tips

Use kubectl explain <resource.field> to view API field documentation on the fly.

For aggregated logs, tools such as stern or kubetail can tail multiple pod logs simultaneously.

Leverage kubectl debug to inject a troubleshooting container without modifying the original pod spec.

11. Summary & Recommendations

Approximately 90 % of issues are resolved by examining kubectl describe output and container logs.

Understand the Pod lifecycle: Pending → Running → CrashLoopBackOff.

Validate that the container image runs locally before deploying to the cluster.

Isolate problems by reproducing them with a minimal Deployment and Service.

Pay close attention to resource requests/limits and RBAC permissions – OOMKilled, Forbidden, and empty Endpoints are frequent culprits.

Master the core debugging tools ( stern, kubectl debug, tcpdump) to accelerate root‑cause analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeKubernetesDevOpstroubleshootingkubectl
Ray's Galactic Tech
Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.