Master Kubernetes Pod Lifecycle and Restart Policies – From Creation to Graceful Termination
This guide walks through Kubernetes pod lifecycle phases, container states, restartPolicy options, health‑check probes, lifecycle hooks, init containers, common troubleshooting scenarios such as CrashLoopBackOff, Pending and Stuck Terminating, and provides best‑practice recommendations for configuration, graceful shutdown, resource limits and monitoring.
Applicable Scenarios & Prerequisites
Use cases include pod fault diagnosis, graceful termination configuration, health‑check setup, and task‑type pod management. Prerequisites: Kubernetes 1.20+, kubectl access, and basic understanding of pods and containers.
Environment & Version Matrix
Kubernetes 1.20‑1.30 – stable lifecycle features
Container runtime: containerd or Docker
Pod Lifecycle Overview
1. Pod Phases
Pending: pod created but containers not started (waiting for scheduling or image pull) Running: at least one container is running Succeeded: all containers terminated successfully (Job/CronJob) Failed: all containers terminated, at least one failed Unknown: pod status cannot be obtained (node lost)
Check pod status with kubectl get pod mypod or kubectl get pod mypod -o jsonpath='{.status.phase}'.
2. Container States
Waiting: waiting to start (image pull, storage) Running: normal operation Terminated: stopped (success or failure)
Inspect container state with kubectl describe pod mypod or
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state}'.
RestartPolicy
1. Policies
Always– always restart (default); used by Deployments, StatefulSets, DaemonSets OnFailure – restart on non‑zero exit code; used by Jobs Never – never restart; for one‑off tasks or debugging pods
2. Validation Examples
Always example:
apiVersion: v1
kind: Pod
metadata:
name: test-always
spec:
restartPolicy: Always
containers:
- name: test
image: busybox
command: ["sh", "-c", "echo hello && sleep 10 && exit 1"]Apply and watch restarts with kubectl apply -f test-always.yaml and kubectl get pod test-always -w. The pod restarts automatically after failure.
OnFailure example:
apiVersion: batch/v1
kind: Job
metadata:
name: test-job
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: test
image: busybox
command: ["sh", "-c", "echo job running && exit 0"]Apply and observe that the job completes without restart.
Health Checks (Probes)
1. Types
livenessProbe : if it fails, the container is restarted
readinessProbe : if it fails, the pod is removed from its Service
startupProbe : for slow‑starting applications (K8s 1.18+)
2. Probe Methods
HTTP GET – specify path, port, optional headers, initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, failureThreshold
TCP socket – specify port and timing parameters
Exec – run a command inside the container
3. Parameter Details
initialDelaySeconds: delay before first probe (default 0, set according to app start time) periodSeconds: interval between probes (default 10, recommended 10‑30) timeoutSeconds: probe timeout (default 1, recommended 1‑5) successThreshold: consecutive successes required (default 1; liveness 1, readiness 1‑3) failureThreshold: consecutive failures before marking unhealthy (default 3, recommended 3‑5)
4. Full Probe Example
apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
# Startup probe for slow‑starting apps
startupProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30
# Liveness probe
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
# Readiness probe
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3Validate with kubectl describe pod web-app and check events via
kubectl get events --field-selector involvedObject.name=web-app.
Lifecycle Hooks
1. postStart
Runs immediately after the container starts. Example:
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo 'Container started' > /tmp/start.log"]Note: postStart runs asynchronously with the container entrypoint; if it fails, the container is killed and restarted.
2. preStop
Executes before SIGTERM is sent, enabling graceful shutdown. Example:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "nginx -s quit; while killall -0 nginx; do sleep 1; done"]Shutdown flow: K8s runs preStop, then sends TERM, waits for terminationGracePeriodSeconds (default 30 s), then sends SIGKILL if needed.
3. Graceful Shutdown Example
apiVersion: v1
kind: Pod
metadata:
name: graceful-shutdown
spec:
terminationGracePeriodSeconds: 60
containers:
- name: nginx
image: nginx:1.21
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10 && nginx -s quit"]
ports:
- containerPort: 80Delete the pod with kubectl delete pod graceful-shutdown and watch the status transition from Running to Terminating to Stopped.
Init Containers
1. Concept
Run sequentially before the main containers start.
All init containers must succeed; otherwise the pod is restarted according to its RestartPolicy.
2. Common Use Cases
Wait for dependent services (e.g., MySQL) using a busybox loop.
Copy configuration files into a shared volume.
Fix permissions on mounted volumes.
# Wait for MySQL
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
initContainers:
- name: wait-for-db
image: busybox:1.34
command: ['sh', '-c', 'until nslookup mysql; do echo waiting for mysql; sleep 2; done']
containers:
- name: myapp
image: myapp:1.0
# Copy config files
initContainers:
- name: init-config
image: busybox:1.34
command: ['sh', '-c', 'cp /config/*.conf /app/config/']
volumeMounts:
- name: config
mountPath: /config
- name: app-config
mountPath: /app/config
# Fix permissions
initContainers:
- name: fix-permissions
image: busybox:1.34
command: ['sh', '-c', 'chown -R 1000:1000 /data']
volumeMounts:
- name: data
mountPath: /data
securityContext:
runAsUser: 0Troubleshooting Scenarios
1. CrashLoopBackOff
Symptoms: pod shows CrashLoopBackOff status. Steps:
# View logs
kubectl logs mypod
kubectl logs mypod --previous
# View events
kubectl describe pod mypod | grep -A 20 Events
# Check exit code
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}'
# Debug with a temporary container
kubectl debug mypod -it --image=busybox --target=myappCommon causes: application start failure, overly strict health checks, resource exhaustion (OOMKilled), incorrect ENTRYPOINT.
2. Pod Pending
# Check scheduling events
kubectl describe pod mypod | grep -i "fail\|warn"
# Typical reasons:
# - Insufficient CPU/memory
# - PVC not bound
# - Node affinity mismatch
# - Taints/tolerations issues3. Pod Terminating Stuck
# Inspect finalizers
kubectl get pod mypod -o jsonpath='{.metadata.finalizers}'
# Force delete (use with caution)
kubectl delete pod mypod --force --grace-period=0Best Practices
Probe Configuration
Use startupProbe for applications that need a warm‑up period.
Set reasonable livenessProbe thresholds to avoid false restarts.
Configure readinessProbe so new versions don’t receive traffic before they are ready.
Graceful Shutdown
Define a preStop hook to finish in‑flight requests.
Set terminationGracePeriodSeconds longer than the maximum request time plus a safety buffer.
Init Containers
Use lightweight images such as busybox to wait for services or prepare files.
RestartPolicy Choice
Deployments – Always (default).
Jobs – OnFailure or Never depending on desired retry behavior.
Debug pods – Never to prevent automatic restarts.
Resource Limits
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"Logging & Monitoring
Export pod restart counts to Prometheus.
Alert when kube_pod_container_status_restarts_total > 5.
Testing Lifecycle
Delete a pod to verify graceful shutdown behavior.
Manually trigger probe failures to ensure correct responses.
Version Compatibility startupProbe requires Kubernetes 1.18+; older clusters can emulate with a large initialDelaySeconds.
Avoid Zombie Processes
Run an init system such as tini or dumb-init as PID 1.
Documentation
Record probe endpoints, expected responses, and shutdown procedures.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
