Operations 15 min read

How to Diagnose and Fix CrashLoopBackOff in Kubernetes: 10 Common Causes

This guide explains the CrashLoopBackOff state, provides quick kubectl commands, lists ten typical reasons such as misconfiguration, image errors, health‑probe issues, OOM kills, and offers step‑by‑step fixes, prevention tips, and best practices for reliable pod deployment.

Ops Community
Ops Community
Ops Community
How to Diagnose and Fix CrashLoopBackOff in Kubernetes: 10 Common Causes

Pod CrashLoopBackOff Troubleshooting: 10 Common Causes and Quick Fix Guide

Applicable Scenarios & Prerequisites

Applicable scenarios : Pods repeatedly restart, applications fail to start, image errors, configuration issues.

Prerequisites :

Kubernetes 1.20+

kubectl access

Basic container concepts

CrashLoopBackOff State Details

Definition : After a container crashes, Kubernetes restarts it with exponential back‑off (10s, 20s, 40s … up to 5 minutes).

Check status :

kubectl get pod
# NAME   READY   STATUS           RESTARTS   AGE
# mypod  0/1     CrashLoopBackOff 5          10m

Quick Command Reference

# 1. View pod status
kubectl describe pod mypod

# 2. View current logs
kubectl logs mypod

# 3. View previous logs (important!)
kubectl logs mypod --previous

# 4. Get container exit code
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

# 5. View events
kubectl get events --field-selector involvedObject.name=mypod

# 6. Debug mode (enter container)
kubectl debug mypod -it --image=busybox --copy-to=mypod-debug

10 Common Causes and Fixes

1. Application startup failure (misconfiguration)

Symptoms :

kubectl logs mypod --previous
# ERROR: Database connection failed: host 'mysql' not found

Exit code : 1 (generic error)

Fix :

# Check environment variables
env:
- name: DB_HOST
  value: "mysql.default.svc.cluster.local"   # use FQDN
# Or use ConfigMap
envFrom:
- configMapRef:
    name: app-config

Verify :

kubectl exec mypod -- env | grep DB_HOST

2. Image ENTRYPOINT/CMD error

Symptoms :

kubectl logs mypod --previous
# exec: "/app/start.sh": stat /app/start.sh: no such file or directory

Exit code : 127 (command not found)

Fix :

# Method 1: Override command
spec:
  containers:
  - name: app
    image: myapp:1.0
    command: ["/bin/sh"]
    args: ["-c", "/usr/local/bin/start.sh"]

# Method 2: Fix Dockerfile
# ENTRYPOINT ["/app/start.sh"]
# RUN chmod +x /app/start.sh

3. Liveness probe too strict

Symptoms :

kubectl describe pod mypod | grep -A 10 Liveness
# Liveness probe failed: Get http://10.0.0.1:8080/healthz: dial tcp 10.0.0.1:8080: connect: connection refused

Exit code : 137 (SIGKILL by kubelet)

Fix :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30   # increase start delay
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3       # increase threshold

# Or use startupProbe (K8s 1.18+)
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 30    # 30*5=150 s startup time

4. Out‑of‑Memory (OOMKilled)

Symptoms :

kubectl describe pod mypod | grep -i "oom\|killed"
# Reason: OOMKilled
# Exit Code: 137

Fix :

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"   # increase limit

# Or optimise application memory usage, e.g. Java heap
env:
- name: JAVA_OPTS
  value: "-Xmx512m -Xms512m"

Verify memory usage :

kubectl top pod mypod

5. Volume mount failure (missing ConfigMap/Secret/PVC)

Symptoms :

kubectl describe pod mypod | grep -i "mount\|volume"
# Error: configmap "app-config" not found

Fix :

# Create missing ConfigMap
kubectl create configmap app-config --from-literal=KEY=VALUE

# Or check PVC binding
kubectl get pvc
# NAME       STATUS   VOLUME   CAPACITY   ACCESS MODES
# data-pvc   Pending  -        -          -   # not bound, check StorageClass

6. Permission issues (file/dir not writable)

Symptoms :

kubectl logs mypod --previous
# Error: failed to write to /data/app.log: permission denied

Exit code : 1

Fix :

# Method 1: Set securityContext
securityContext:
  runAsUser: 1000
  fsGroup: 1000

# Method 2: Use initContainer to fix permissions
initContainers:
- name: fix-permissions
  image: busybox
  command: ['sh', '-c', 'chown -R 1000:1000 /data']
  volumeMounts:
  - name: data
    mountPath: /data
  securityContext:
    runAsUser: 0

7. Dependent service not ready

Symptoms :

kubectl logs mypod --previous
# Error: connection to database 'mysql:3306' timed out

Fix :

# Use initContainer to wait for dependency
initContainers:
- name: wait-for-db
  image: busybox:1.34
  command: ['sh', '-c', 'until nc -z mysql 3306; do echo waiting for mysql; sleep 2; done']

8. Missing environment variable reference

Symptoms :

kubectl describe pod mypod
# Warning: container "app" has unresolved env reference: ConfigMap "nonexistent-config" not found

Fix :

# Create missing ConfigMap/Secret
kubectl create configmap app-config --from-literal=KEY=VALUE

# Or make reference optional
envFrom:
- configMapRef:
    name: app-config
    optional: true   # no error if missing

9. Image pull failure

Symptoms :

kubectl describe pod mypod | grep -i "imagepull"
# Warning: Failed to pull image "myapp:latest": rpc error: ... not found

Fix :

# 1. Verify image exists
docker pull myapp:latest

# 2. Use correct registry
image: registry.example.com/myapp:v1.0

# 3. Configure image pull secret
imagePullSecrets:
- name: regcred

# Create regcred
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass

10. Container process exits immediately

Symptoms :

kubectl logs mypod --previous
# (no output)

kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# 0  # successful exit but container terminated

Cause : The container CMD runs a foreground process that finishes instantly (e.g., sh -c "echo hello").

Fix :

# Wrong: background process
command: ["sh", "-c", "nginx && sleep infinity"]   # wrong

# Correct: run Nginx in foreground
command: ["nginx", "-g", "daemon off;"]   # Nginx stays alive

# Or keep container alive for debugging
command: ["sh", "-c", "while true; do sleep 3600; done"]

Exit Code Quick Reference

Exit Code

Meaning

Common Causes

0

Successful exit

Process ended normally (but container terminated)

1

Generic error

Application startup failure, misconfiguration

2

Misused shell command

Script syntax error

126

Command not executable

File permission issue

127

Command not found

ENTRYPOINT/CMD path error

137

SIGKILL (OOM)

Out‑of‑memory kill

139

SIGSEGV

Segmentation fault (app bug)

143

SIGTERM

Graceful termination

Quick Troubleshooting Flowchart

graph TD
    A[Pod CrashLoopBackOff] --> B{Check logs}
    B -->|Logs present| C[Analyze error]
    B -->|No logs| D[Check exit code]

    C --> E{Error type}
    E -->|Config error| F[Fix env/ConfigMap]
    E -->|Dependency not ready| G[Add initContainer wait]
    E -->|Permission issue| H[Set securityContext]

    D --> I{Exit code}
    I -->|127| J[Fix ENTRYPOINT]
    I -->|137| K[Increase memory limits]
    I -->|0| L[Fix foreground/background process]

Practical Cases

Case 1: Database connection failure

Observation :

kubectl logs app-pod --previous
# Error: dial tcp: lookup mysql on 10.96.0.10:53: no such host

Analysis : DNS resolution failed; Service name incorrect or not in same namespace.

Fix :

env:
- name: DB_HOST
  value: "mysql.database.svc.cluster.local"   # use FQDN

Case 2: Slow start killed by liveness probe

Observation :

kubectl describe pod app-pod
# Liveness probe failed: Get http://10.0.0.1:8080/healthz: dial tcp 10.0.0.1:8080: connect: connection refused
# Container app failed liveness probe, will be restarted

Analysis : Application needs 30 s to start, but liveness probe checks after 10 s.

Fix :

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 5   # up to 150 s startup time

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10

Prevention Measures

CI/CD stage :

Test images locally (docker run)

Validate environment variables in integration tests

Check ENTRYPOINT paths

Before deployment :

Validate with kubectl apply --dry-run=server Ensure ConfigMap/Secret existence

Confirm StorageClass availability

Monitoring & alerts :

# Prometheus alert rule
- alert: PodCrashLooping
  expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
  for: 5m
  annotations:
    summary: "Pod {{ $labels.pod }} restarting frequently"

Resource limits :

Set reasonable requests and limits

Avoid OOMKilled

Health checks :

Provide sufficient initialDelaySeconds

Set appropriate failureThreshold

Use startupProbe for slow starts

Best Practices

Log first : Use --previous logs before describe.

Exit code analysis : 137=OOM, 127=command not found.

Debug container : kubectl debug to enter.

Layered investigation : Application → Config → Dependencies → Resources → Network.

Fast rollback : Revert to previous version immediately on failure.

Documentation : Record common errors and fixes.

Test environment : Verify in dev/staging first.

Resource reservation : Avoid requests=limits causing shortages.

Graceful startup : Use readinessProbe to prevent early traffic.

Comprehensive monitoring : Alert on restarts, OOM, image pull failures.

KuberneteskubectlPod troubleshootingCrashLoopBackOff
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.