Cloud Native 10 min read

How to Diagnose CrashLoopBackOff in Kubernetes: A Practical Guide

This article explains that CrashLoopBackOff is a symptom, not the root cause, and walks through a production‑grade troubleshooting workflow—including checking pod status, describing events, examining logs (current and previous), and exec‑ing into containers—while covering common failures such as OOMKilled, liveness‑probe misconfiguration, bad config files, database connection issues, image command errors, and disk‑pressure problems, and warns against premature pod deletion.

Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
How to Diagnose CrashLoopBackOff in Kubernetes: A Practical Guide

CrashLoopBackOff Overview

CrashLoopBackOff indicates that a container repeatedly crashes and the kubelet has attempted to restart it many times. The pod itself is created successfully; the failure occurs inside the container.

Pod created → container starts → business program fails → main process exits → kubelet restarts container → repeat → CrashLoopBackOff

Production Investigation Order

Do not delete the pod before gathering evidence, otherwise logs and events are lost.

1. kubectl get pod -A
2. kubectl describe pod pod-name
3. kubectl logs pod-name
4. kubectl logs --previous pod-name
5. kubectl exec -it pod-name -- sh

Step 1 – Check Pod Status

Run kubectl get pod -A and examine the columns:

READY : container health

STATUS : current pod phase (e.g., CrashLoopBackOff)

RESTARTS : number of restarts; a high value shows repeated crashes

AGE : how long the pod has existed

Step 2 – Describe Events

Execute kubectl describe pod pod-name and inspect the Events section. It often reveals the immediate cause such as OOMKilled, probe failures, or configuration errors.

Classic Fault 1 – OOMKilled

Last State:   Terminated
Reason:       OOMKilled
Exit Code:    137

Example manifest:

resources:
  limits:
    memory: "512Mi"
# JVM started with -Xmx1024m, requiring ~1Gi

Fixes: increase the memory limit (e.g., memory: 2Gi), reduce JVM heap size (e.g., -Xms256m -Xmx512m), or investigate memory leaks.

Classic Fault 2 – Liveness Probe Failure

Liveness probe failed: Get http://10.244.1.15:8080/healthz: dial tcp 10.244.1.15:8080: connect: connection refused

Typical cause: the probe runs before the application is ready (e.g., initialDelaySeconds: 5 while the app needs ~90 s to start).

Resolution: check startup logs with kubectl logs pod-name and adjust initialDelaySeconds, timeoutSeconds, and failureThreshold to match the real startup time.

Classic Fault 3 – Configuration File Errors

Failed to load property source
mapping values are not allowed here

Examples include malformed SpringBoot YAML or Nginx configuration syntax errors that cause the container to exit immediately.

Investigation commands:

kubectl get cm
kubectl describe cm config-map-name
kubectl exec -it pod-name -- sh
cat /app/config/application.yml

Classic Fault 4 – Database Connection Failure

Communications link failure
Connection refused
Unknown host mysql-service

Root causes often involve the database not running, wrong Service name, DNS resolution failure, credential errors, or network issues.

Check commands:

kubectl get svc
kubectl exec -it pod-name -- sh
nslookup mysql-service
telnet mysql-service 3306
nc -zv mysql-service 3306

Classic Fault 5 – Image Startup Command Error

exec: "javaa": executable file not found
permission denied

Usually caused by a typo in the Dockerfile CMD or ENTRYPOINT (e.g., "javaa" instead of "java").

Verify the deployment spec:

kubectl get deploy app -o yaml
# check command, args, image fields

Classic Fault 6 – Disk Space Exhaustion

Evicted
The node had condition: [DiskPressure]

Check node disk usage and image storage:

df -h
docker system df
crictl images

Most Important Command – logs --previous

If the container has already restarted, the current logs belong to the new instance. The crash logs are in the previous container and can be retrieved with kubectl logs --previous pod-name. This command resolves many CrashLoopBackOff investigations.

Complete Production Troubleshooting Flow

Identify the failing pod: kubectl get pod -A Inspect events: kubectl describe pod pod-name View current logs: kubectl logs pod-name View previous logs: kubectl logs --previous pod-name Enter the container for deeper inspection:

kubectl exec -it pod-name -- sh

Common Mistakes to Avoid

Deleting pods before reading logs loses valuable evidence.

Ignoring --previous logs hides the actual crash information.

Assuming the problem lies in Kubernetes; in most cases the root cause is inside the container (application code, configuration, resources, dependencies, network, or probes).

Key Takeaway

CrashLoopBackOff is only a symptom. The real issue always resides inside the container – in the program, its configuration, resource limits, dependencies, network, or health probes. Follow the ordered steps (describe → logs → previous → exec) to locate the root cause efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud nativeKubernetestroubleshootingkubectlliveness probeOOMKilledCrashLoopBackOff
Full-Stack DevOps & Kubernetes
Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.