How to Diagnose and Fix CrashLoopBackOff in Kubernetes: 10 Common Causes
This guide explains the CrashLoopBackOff state, provides quick kubectl commands, lists ten typical reasons such as misconfiguration, image errors, health‑probe issues, OOM kills, and offers step‑by‑step fixes, prevention tips, and best practices for reliable pod deployment.
Pod CrashLoopBackOff Troubleshooting: 10 Common Causes and Quick Fix Guide
Applicable Scenarios & Prerequisites
Applicable scenarios : Pods repeatedly restart, applications fail to start, image errors, configuration issues.
Prerequisites :
Kubernetes 1.20+
kubectl access
Basic container concepts
CrashLoopBackOff State Details
Definition : After a container crashes, Kubernetes restarts it with exponential back‑off (10s, 20s, 40s … up to 5 minutes).
Check status :
kubectl get pod
# NAME READY STATUS RESTARTS AGE
# mypod 0/1 CrashLoopBackOff 5 10mQuick Command Reference
# 1. View pod status
kubectl describe pod mypod
# 2. View current logs
kubectl logs mypod
# 3. View previous logs (important!)
kubectl logs mypod --previous
# 4. Get container exit code
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# 5. View events
kubectl get events --field-selector involvedObject.name=mypod
# 6. Debug mode (enter container)
kubectl debug mypod -it --image=busybox --copy-to=mypod-debug10 Common Causes and Fixes
1. Application startup failure (misconfiguration)
Symptoms :
kubectl logs mypod --previous
# ERROR: Database connection failed: host 'mysql' not foundExit code : 1 (generic error)
Fix :
# Check environment variables
env:
- name: DB_HOST
value: "mysql.default.svc.cluster.local" # use FQDN
# Or use ConfigMap
envFrom:
- configMapRef:
name: app-configVerify :
kubectl exec mypod -- env | grep DB_HOST2. Image ENTRYPOINT/CMD error
Symptoms :
kubectl logs mypod --previous
# exec: "/app/start.sh": stat /app/start.sh: no such file or directoryExit code : 127 (command not found)
Fix :
# Method 1: Override command
spec:
containers:
- name: app
image: myapp:1.0
command: ["/bin/sh"]
args: ["-c", "/usr/local/bin/start.sh"]
# Method 2: Fix Dockerfile
# ENTRYPOINT ["/app/start.sh"]
# RUN chmod +x /app/start.sh3. Liveness probe too strict
Symptoms :
kubectl describe pod mypod | grep -A 10 Liveness
# Liveness probe failed: Get http://10.0.0.1:8080/healthz: dial tcp 10.0.0.1:8080: connect: connection refusedExit code : 137 (SIGKILL by kubelet)
Fix :
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30 # increase start delay
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # increase threshold
# Or use startupProbe (K8s 1.18+)
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30*5=150 s startup time4. Out‑of‑Memory (OOMKilled)
Symptoms :
kubectl describe pod mypod | grep -i "oom\|killed"
# Reason: OOMKilled
# Exit Code: 137Fix :
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi" # increase limit
# Or optimise application memory usage, e.g. Java heap
env:
- name: JAVA_OPTS
value: "-Xmx512m -Xms512m"Verify memory usage :
kubectl top pod mypod5. Volume mount failure (missing ConfigMap/Secret/PVC)
Symptoms :
kubectl describe pod mypod | grep -i "mount\|volume"
# Error: configmap "app-config" not foundFix :
# Create missing ConfigMap
kubectl create configmap app-config --from-literal=KEY=VALUE
# Or check PVC binding
kubectl get pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES
# data-pvc Pending - - - # not bound, check StorageClass6. Permission issues (file/dir not writable)
Symptoms :
kubectl logs mypod --previous
# Error: failed to write to /data/app.log: permission deniedExit code : 1
Fix :
# Method 1: Set securityContext
securityContext:
runAsUser: 1000
fsGroup: 1000
# Method 2: Use initContainer to fix permissions
initContainers:
- name: fix-permissions
image: busybox
command: ['sh', '-c', 'chown -R 1000:1000 /data']
volumeMounts:
- name: data
mountPath: /data
securityContext:
runAsUser: 07. Dependent service not ready
Symptoms :
kubectl logs mypod --previous
# Error: connection to database 'mysql:3306' timed outFix :
# Use initContainer to wait for dependency
initContainers:
- name: wait-for-db
image: busybox:1.34
command: ['sh', '-c', 'until nc -z mysql 3306; do echo waiting for mysql; sleep 2; done']8. Missing environment variable reference
Symptoms :
kubectl describe pod mypod
# Warning: container "app" has unresolved env reference: ConfigMap "nonexistent-config" not foundFix :
# Create missing ConfigMap/Secret
kubectl create configmap app-config --from-literal=KEY=VALUE
# Or make reference optional
envFrom:
- configMapRef:
name: app-config
optional: true # no error if missing9. Image pull failure
Symptoms :
kubectl describe pod mypod | grep -i "imagepull"
# Warning: Failed to pull image "myapp:latest": rpc error: ... not foundFix :
# 1. Verify image exists
docker pull myapp:latest
# 2. Use correct registry
image: registry.example.com/myapp:v1.0
# 3. Configure image pull secret
imagePullSecrets:
- name: regcred
# Create regcred
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass10. Container process exits immediately
Symptoms :
kubectl logs mypod --previous
# (no output)
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# 0 # successful exit but container terminatedCause : The container CMD runs a foreground process that finishes instantly (e.g., sh -c "echo hello").
Fix :
# Wrong: background process
command: ["sh", "-c", "nginx && sleep infinity"] # wrong
# Correct: run Nginx in foreground
command: ["nginx", "-g", "daemon off;"] # Nginx stays alive
# Or keep container alive for debugging
command: ["sh", "-c", "while true; do sleep 3600; done"]Exit Code Quick Reference
Exit Code
Meaning
Common Causes
0
Successful exit
Process ended normally (but container terminated)
1
Generic error
Application startup failure, misconfiguration
2
Misused shell command
Script syntax error
126
Command not executable
File permission issue
127
Command not found
ENTRYPOINT/CMD path error
137
SIGKILL (OOM)
Out‑of‑memory kill
139
SIGSEGV
Segmentation fault (app bug)
143
SIGTERM
Graceful termination
Quick Troubleshooting Flowchart
graph TD
A[Pod CrashLoopBackOff] --> B{Check logs}
B -->|Logs present| C[Analyze error]
B -->|No logs| D[Check exit code]
C --> E{Error type}
E -->|Config error| F[Fix env/ConfigMap]
E -->|Dependency not ready| G[Add initContainer wait]
E -->|Permission issue| H[Set securityContext]
D --> I{Exit code}
I -->|127| J[Fix ENTRYPOINT]
I -->|137| K[Increase memory limits]
I -->|0| L[Fix foreground/background process]Practical Cases
Case 1: Database connection failure
Observation :
kubectl logs app-pod --previous
# Error: dial tcp: lookup mysql on 10.96.0.10:53: no such hostAnalysis : DNS resolution failed; Service name incorrect or not in same namespace.
Fix :
env:
- name: DB_HOST
value: "mysql.database.svc.cluster.local" # use FQDNCase 2: Slow start killed by liveness probe
Observation :
kubectl describe pod app-pod
# Liveness probe failed: Get http://10.0.0.1:8080/healthz: dial tcp 10.0.0.1:8080: connect: connection refused
# Container app failed liveness probe, will be restartedAnalysis : Application needs 30 s to start, but liveness probe checks after 10 s.
Fix :
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 5 # up to 150 s startup time
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10Prevention Measures
CI/CD stage :
Test images locally (docker run)
Validate environment variables in integration tests
Check ENTRYPOINT paths
Before deployment :
Validate with kubectl apply --dry-run=server Ensure ConfigMap/Secret existence
Confirm StorageClass availability
Monitoring & alerts :
# Prometheus alert rule
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
annotations:
summary: "Pod {{ $labels.pod }} restarting frequently"Resource limits :
Set reasonable requests and limits
Avoid OOMKilled
Health checks :
Provide sufficient initialDelaySeconds
Set appropriate failureThreshold
Use startupProbe for slow starts
Best Practices
Log first : Use --previous logs before describe.
Exit code analysis : 137=OOM, 127=command not found.
Debug container : kubectl debug to enter.
Layered investigation : Application → Config → Dependencies → Resources → Network.
Fast rollback : Revert to previous version immediately on failure.
Documentation : Record common errors and fixes.
Test environment : Verify in dev/staging first.
Resource reservation : Avoid requests=limits causing shortages.
Graceful startup : Use readinessProbe to prevent early traffic.
Comprehensive monitoring : Alert on restarts, OOM, image pull failures.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
