Cloud Native 15 min read

Master Kubernetes Pod Lifecycle and Restart Policies – From Creation to Graceful Termination

This guide walks through Kubernetes pod lifecycle phases, container states, restartPolicy options, health‑check probes, lifecycle hooks, init containers, common troubleshooting scenarios such as CrashLoopBackOff, Pending and Stuck Terminating, and provides best‑practice recommendations for configuration, graceful shutdown, resource limits and monitoring.

Raymond Ops
Raymond Ops
Raymond Ops
Master Kubernetes Pod Lifecycle and Restart Policies – From Creation to Graceful Termination

Applicable Scenarios & Prerequisites

Use cases include pod fault diagnosis, graceful termination configuration, health‑check setup, and task‑type pod management. Prerequisites: Kubernetes 1.20+, kubectl access, and basic understanding of pods and containers.

Environment & Version Matrix

Kubernetes 1.20‑1.30 – stable lifecycle features

Container runtime: containerd or Docker

Pod Lifecycle Overview

1. Pod Phases

Pending

: pod created but containers not started (waiting for scheduling or image pull) Running: at least one container is running Succeeded: all containers terminated successfully (Job/CronJob) Failed: all containers terminated, at least one failed Unknown: pod status cannot be obtained (node lost)

Check pod status with kubectl get pod mypod or kubectl get pod mypod -o jsonpath='{.status.phase}'.

2. Container States

Waiting

: waiting to start (image pull, storage) Running: normal operation Terminated: stopped (success or failure)

Inspect container state with kubectl describe pod mypod or

kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state}'

.

RestartPolicy

1. Policies

Always

– always restart (default); used by Deployments, StatefulSets, DaemonSets OnFailure – restart on non‑zero exit code; used by Jobs Never – never restart; for one‑off tasks or debugging pods

2. Validation Examples

Always example:

apiVersion: v1
kind: Pod
metadata:
  name: test-always
spec:
  restartPolicy: Always
  containers:
  - name: test
    image: busybox
    command: ["sh", "-c", "echo hello && sleep 10 && exit 1"]

Apply and watch restarts with kubectl apply -f test-always.yaml and kubectl get pod test-always -w. The pod restarts automatically after failure.

OnFailure example:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-job
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: test
        image: busybox
        command: ["sh", "-c", "echo job running && exit 0"]

Apply and observe that the job completes without restart.

Health Checks (Probes)

1. Types

livenessProbe : if it fails, the container is restarted

readinessProbe : if it fails, the pod is removed from its Service

startupProbe : for slow‑starting applications (K8s 1.18+)

2. Probe Methods

HTTP GET – specify path, port, optional headers, initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, failureThreshold

TCP socket – specify port and timing parameters

Exec – run a command inside the container

3. Parameter Details

initialDelaySeconds

: delay before first probe (default 0, set according to app start time) periodSeconds: interval between probes (default 10, recommended 10‑30) timeoutSeconds: probe timeout (default 1, recommended 1‑5) successThreshold: consecutive successes required (default 1; liveness 1, readiness 1‑3) failureThreshold: consecutive failures before marking unhealthy (default 3, recommended 3‑5)

4. Full Probe Example

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    # Startup probe for slow‑starting apps
    startupProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 5
      failureThreshold: 30
    # Liveness probe
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 2
      failureThreshold: 3
    # Readiness probe
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 2
      failureThreshold: 3

Validate with kubectl describe pod web-app and check events via

kubectl get events --field-selector involvedObject.name=web-app

.

Lifecycle Hooks

1. postStart

Runs immediately after the container starts. Example:

lifecycle:
  postStart:
    exec:
      command: ["/bin/sh", "-c", "echo 'Container started' > /tmp/start.log"]

Note: postStart runs asynchronously with the container entrypoint; if it fails, the container is killed and restarted.

2. preStop

Executes before SIGTERM is sent, enabling graceful shutdown. Example:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "nginx -s quit; while killall -0 nginx; do sleep 1; done"]

Shutdown flow: K8s runs preStop, then sends TERM, waits for terminationGracePeriodSeconds (default 30 s), then sends SIGKILL if needed.

3. Graceful Shutdown Example

apiVersion: v1
kind: Pod
metadata:
  name: graceful-shutdown
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: nginx
    image: nginx:1.21
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 10 && nginx -s quit"]
    ports:
    - containerPort: 80

Delete the pod with kubectl delete pod graceful-shutdown and watch the status transition from Running to Terminating to Stopped.

Init Containers

1. Concept

Run sequentially before the main containers start.

All init containers must succeed; otherwise the pod is restarted according to its RestartPolicy.

2. Common Use Cases

Wait for dependent services (e.g., MySQL) using a busybox loop.

Copy configuration files into a shared volume.

Fix permissions on mounted volumes.

# Wait for MySQL
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  initContainers:
  - name: wait-for-db
    image: busybox:1.34
    command: ['sh', '-c', 'until nslookup mysql; do echo waiting for mysql; sleep 2; done']
  containers:
  - name: myapp
    image: myapp:1.0

# Copy config files
initContainers:
- name: init-config
  image: busybox:1.34
  command: ['sh', '-c', 'cp /config/*.conf /app/config/']
  volumeMounts:
  - name: config
    mountPath: /config
  - name: app-config
    mountPath: /app/config

# Fix permissions
initContainers:
- name: fix-permissions
  image: busybox:1.34
  command: ['sh', '-c', 'chown -R 1000:1000 /data']
  volumeMounts:
  - name: data
    mountPath: /data
  securityContext:
    runAsUser: 0

Troubleshooting Scenarios

1. CrashLoopBackOff

Symptoms: pod shows CrashLoopBackOff status. Steps:

# View logs
kubectl logs mypod
kubectl logs mypod --previous
# View events
kubectl describe pod mypod | grep -A 20 Events
# Check exit code
kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}'
# Debug with a temporary container
kubectl debug mypod -it --image=busybox --target=myapp

Common causes: application start failure, overly strict health checks, resource exhaustion (OOMKilled), incorrect ENTRYPOINT.

2. Pod Pending

# Check scheduling events
kubectl describe pod mypod | grep -i "fail\|warn"
# Typical reasons:
# - Insufficient CPU/memory
# - PVC not bound
# - Node affinity mismatch
# - Taints/tolerations issues

3. Pod Terminating Stuck

# Inspect finalizers
kubectl get pod mypod -o jsonpath='{.metadata.finalizers}'
# Force delete (use with caution)
kubectl delete pod mypod --force --grace-period=0

Best Practices

Probe Configuration

Use startupProbe for applications that need a warm‑up period.

Set reasonable livenessProbe thresholds to avoid false restarts.

Configure readinessProbe so new versions don’t receive traffic before they are ready.

Graceful Shutdown

Define a preStop hook to finish in‑flight requests.

Set terminationGracePeriodSeconds longer than the maximum request time plus a safety buffer.

Init Containers

Use lightweight images such as busybox to wait for services or prepare files.

RestartPolicy Choice

Deployments – Always (default).

Jobs – OnFailure or Never depending on desired retry behavior.

Debug pods – Never to prevent automatic restarts.

Resource Limits

resources:
  requests:
    memory: "256Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "1000m"

Logging & Monitoring

Export pod restart counts to Prometheus.

Alert when kube_pod_container_status_restarts_total > 5.

Testing Lifecycle

Delete a pod to verify graceful shutdown behavior.

Manually trigger probe failures to ensure correct responses.

Version Compatibility startupProbe requires Kubernetes 1.18+; older clusters can emulate with a large initialDelaySeconds.

Avoid Zombie Processes

Run an init system such as tini or dumb-init as PID 1.

Documentation

Record probe endpoints, expected responses, and shutdown procedures.

Kubernetesbest practicesTroubleshootingPod LifecycleHealth probesInit containersRestartPolicy
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.