Mastering Kubernetes Pod Lifecycle and Restart Policies: A Hands‑On Guide
This guide walks through Kubernetes pod lifecycle phases, container states, restart policies, health‑check probes, lifecycle hooks, init containers, common troubleshooting scenarios, and best‑practice recommendations, providing concrete YAML examples and kubectl commands to help operators manage pods from creation to graceful termination.
Kubernetes Pod Lifecycle and Restart Policies
Applicable Scenarios & Prerequisites
Suitable for pod troubleshooting, graceful termination configuration, health check setup, and batch‑type pod management.
Kubernetes 1.20+
kubectl access
Understanding of pods and containers
Environment and Version Matrix
Kubernetes: 1.20‑1.30 (stable lifecycle features)
Container Runtime: containerd or Docker
Complete Pod Lifecycle Process
1. Pod Phases
Pending: Pod created, containers not started (waiting for scheduling or image pull) Running: At least one container is running Succeeded: All containers terminated successfully (Job/CronJob) Failed: All containers terminated, at least one failed Unknown: Unable to obtain pod status (node lost)
Check pod status:
kubectl get pod mypod
# NAME READY STATUS RESTARTS AGE
# mypod 1/1 Running 0 5m
kubectl get pod mypod -o jsonpath='{.status.phase}'
# Running2. Container States
Waiting: Waiting to start (image pull, volume mount) Running: Normal operation Terminated: Finished (success or failure)
Check container state:
kubectl describe pod mypod
# State: Running
# Started: 2025-10-24 10:00:00 +0800 CST
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state}'
# {"running":{"startedAt":"2025-10-24T02:00:00Z"}}RestartPolicy
1. Three Policies
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
restartPolicy: Always # Always | OnFailure | Never
containers:
- name: myapp
image: nginx:1.21 Always: Restart container after termination (default). Used by Deployments, StatefulSets, DaemonSets. OnFailure: Restart only when exit code is non‑zero. Used by Jobs. Never: Never restart. Suitable for one‑off tasks or debugging pods.
2. RestartPolicy Verification
Always example:
apiVersion: v1
kind: Pod
metadata:
name: test-always
spec:
restartPolicy: Always
containers:
- name: test
image: busybox
command: ["sh","-c","echo hello && sleep 10 && exit 1"] # Apply and watch
kubectl apply -f test-always.yaml
kubectl get pod test-always -w
# Shows automatic restarts and CrashLoopBackOff delayOnFailure example:
apiVersion: batch/v1
kind: Job
metadata:
name: test-job
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: test
image: busybox
command: ["sh","-c","echo job running && exit 0"] # Apply and observe
kubectl apply -f test-job.yaml
kubectl get pod -l job-name=test-job
# Completed without restartHealth Checks (Probes)
1. Probe Types
livenessProbe : Restarts container on failure.
readinessProbe : Removes pod from Service on failure.
startupProbe : For slow‑starting apps (K8s 1.18+).
2. Probe Methods
HTTP GET example:
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3TCP socket example:
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
periodSeconds: 20Exec example:
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 53. Probe Parameters
initialDelaySeconds: Delay before first check (default 0, set per app startup time). periodSeconds: Interval between checks (default 10, recommended 10‑30). timeoutSeconds: Timeout per check (default 1, recommended 1‑5). successThreshold: Consecutive successes required (default 1; readiness may use 1‑3). failureThreshold: Consecutive failures before marking unhealthy (default 3, recommended 3‑5).
4. Full Probe Example
apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
startupProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30*5=150 s startup time
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3Lifecycle Hooks
1. postStart
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","echo 'Container started' > /tmp/start.log"]Note: postStart runs asynchronously with the container ENTRYPOINT; failure kills the container.
2. preStop
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]Graceful shutdown flow:
K8s sends TERM after executing preStop.
preStop finishes, then TERM is sent.
Pod waits terminationGracePeriodSeconds (default 30 s).
After timeout, SIGKILL forces termination.
3. Graceful Shutdown Example
apiVersion: v1
kind: Pod
metadata:
name: graceful-shutdown
spec:
terminationGracePeriodSeconds: 60
containers:
- name: nginx
image: nginx:1.21
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","sleep 10 && nginx -s quit"]
ports:
- containerPort: 80 # Delete pod and watch
kubectl delete pod graceful-shutdown
kubectl get pod graceful-shutdown -w
# Shows Running → Terminating → stopped after preStop runsInit Containers
1. Concept
Run sequentially before the main containers.
Each must succeed before the next starts.
Failure restarts the whole pod according to restartPolicy.
2. Common Use Cases
Wait for dependent service:
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
initContainers:
- name: wait-for-db
image: busybox:1.34
command: ['sh','-c','until nslookup mysql; do echo waiting for mysql; sleep 2; done']
containers:
- name: myapp
image: myapp:1.0Initialize configuration files:
initContainers:
- name: init-config
image: busybox:1.34
command: ['sh','-c','cp /config/*.conf /app/config/']
volumeMounts:
- name: config
mountPath: /config
- name: app-config
mountPath: /app/configSet permissions:
initContainers:
- name: fix-permissions
image: busybox:1.34
command: ['sh','-c','chown -R 1000:1000 /data']
volumeMounts:
- name: data
mountPath: /data
securityContext:
runAsUser: 0Troubleshooting Scenarios
1. CrashLoopBackOff
# Observe status
kubectl get pod
# NAME READY STATUS RESTARTS AGE
# mypod 0/1 CrashLoopBackOff 5 5mSteps:
View logs: kubectl logs mypod and kubectl logs mypod --previous.
Inspect events: kubectl describe pod mypod | grep -A 20 Events.
Check exit code:
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}'.
Debug with temporary container if the pod exits quickly: kubectl debug mypod -it --image=busybox --target=myapp.
Common causes: application start failure, overly strict health checks, resource exhaustion (OOMKilled), incorrect ENTRYPOINT.
2. Pod Pending
# Check scheduling events
kubectl describe pod mypod | grep -i "fail\|warn"
# Typical reasons:
# - Insufficient CPU/memory
# - PVC not bound
# - Node affinity mismatch
# - Taint/toleration issues3. Pod Stuck in Terminating
# Inspect finalizers
kubectl get pod mypod -o jsonpath='{.metadata.finalizers}'
# Force delete (use with caution)
kubectl delete pod mypod --force --grace-period=0Best Practices
Configure probes wisely:
Use startupProbe for slow‑starting apps.
Set reasonable livenessProbe thresholds to avoid false restarts.
Use readinessProbe to keep traffic away from unhealthy pods.
Implement graceful shutdown:
Define preStop to finish in‑flight requests.
Set terminationGracePeriodSeconds ≥ max request time + 10 s.
Leverage init containers for dependency checks and lightweight setup tasks.
Choose appropriate restartPolicy:
Deployments → Always (default).
Jobs → OnFailure or Never.
Debug pods → Never.
Define resource requests/limits (e.g., 256Mi/500m request, 512Mi/1000m limit).
Integrate logging and monitoring (Prometheus metric kube_pod_container_status_restarts_total > 5 for alerts).
Test lifecycle behavior by deleting pods, forcing probe failures, and observing events.
Be aware of version compatibility: startupProbe requires K8s 1.18+, otherwise use initialDelaySeconds.
Avoid zombie processes by using an init system like tini or dumb-init as PID 1.
Document probe endpoints, expected responses, and graceful shutdown procedures.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
