Cloud Native 14 min read

Mastering Kubernetes Pod Lifecycle and Restart Policies: A Hands‑On Guide

This guide walks through Kubernetes pod lifecycle phases, container states, restart policies, health‑check probes, lifecycle hooks, init containers, common troubleshooting scenarios, and best‑practice recommendations, providing concrete YAML examples and kubectl commands to help operators manage pods from creation to graceful termination.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Kubernetes Pod Lifecycle and Restart Policies: A Hands‑On Guide

Kubernetes Pod Lifecycle and Restart Policies

Applicable Scenarios & Prerequisites

Suitable for pod troubleshooting, graceful termination configuration, health check setup, and batch‑type pod management.

Kubernetes 1.20+

kubectl access

Understanding of pods and containers

Environment and Version Matrix

Kubernetes: 1.20‑1.30 (stable lifecycle features)

Container Runtime: containerd or Docker

Complete Pod Lifecycle Process

1. Pod Phases

Pending

: Pod created, containers not started (waiting for scheduling or image pull) Running: At least one container is running Succeeded: All containers terminated successfully (Job/CronJob) Failed: All containers terminated, at least one failed Unknown: Unable to obtain pod status (node lost)

Check pod status:

kubectl get pod mypod
# NAME   READY   STATUS   RESTARTS   AGE
# mypod  1/1     Running   0          5m

kubectl get pod mypod -o jsonpath='{.status.phase}'
# Running

2. Container States

Waiting

: Waiting to start (image pull, volume mount) Running: Normal operation Terminated: Finished (success or failure)

Check container state:

kubectl describe pod mypod
# State:  Running
# Started: 2025-10-24 10:00:00 +0800 CST

kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state}'
# {"running":{"startedAt":"2025-10-24T02:00:00Z"}}

RestartPolicy

1. Three Policies

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  restartPolicy: Always   # Always | OnFailure | Never
  containers:
  - name: myapp
    image: nginx:1.21
Always

: Restart container after termination (default). Used by Deployments, StatefulSets, DaemonSets. OnFailure: Restart only when exit code is non‑zero. Used by Jobs. Never: Never restart. Suitable for one‑off tasks or debugging pods.

2. RestartPolicy Verification

Always example:

apiVersion: v1
kind: Pod
metadata:
  name: test-always
spec:
  restartPolicy: Always
  containers:
  - name: test
    image: busybox
    command: ["sh","-c","echo hello && sleep 10 && exit 1"]
# Apply and watch
kubectl apply -f test-always.yaml
kubectl get pod test-always -w
# Shows automatic restarts and CrashLoopBackOff delay

OnFailure example:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-job
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: test
        image: busybox
        command: ["sh","-c","echo job running && exit 0"]
# Apply and observe
kubectl apply -f test-job.yaml
kubectl get pod -l job-name=test-job
# Completed without restart

Health Checks (Probes)

1. Probe Types

livenessProbe : Restarts container on failure.

readinessProbe : Removes pod from Service on failure.

startupProbe : For slow‑starting apps (K8s 1.18+).

2. Probe Methods

HTTP GET example:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
    - name: Custom-Header
      value: Awesome
  initialDelaySeconds: 3
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3

TCP socket example:

livenessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 15
  periodSeconds: 20

Exec example:

livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

3. Probe Parameters

initialDelaySeconds

: Delay before first check (default 0, set per app startup time). periodSeconds: Interval between checks (default 10, recommended 10‑30). timeoutSeconds: Timeout per check (default 1, recommended 1‑5). successThreshold: Consecutive successes required (default 1; readiness may use 1‑3). failureThreshold: Consecutive failures before marking unhealthy (default 3, recommended 3‑5).

4. Full Probe Example

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    startupProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 5
      failureThreshold: 30   # 30*5=150 s startup time
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 2
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 2
      failureThreshold: 3

Lifecycle Hooks

1. postStart

lifecycle:
  postStart:
    exec:
      command: ["/bin/sh","-c","echo 'Container started' > /tmp/start.log"]

Note: postStart runs asynchronously with the container ENTRYPOINT; failure kills the container.

2. preStop

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]

Graceful shutdown flow:

K8s sends TERM after executing preStop.

preStop finishes, then TERM is sent.

Pod waits terminationGracePeriodSeconds (default 30 s).

After timeout, SIGKILL forces termination.

3. Graceful Shutdown Example

apiVersion: v1
kind: Pod
metadata:
  name: graceful-shutdown
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: nginx
    image: nginx:1.21
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh","-c","sleep 10 && nginx -s quit"]
    ports:
    - containerPort: 80
# Delete pod and watch
kubectl delete pod graceful-shutdown
kubectl get pod graceful-shutdown -w
# Shows Running → Terminating → stopped after preStop runs

Init Containers

1. Concept

Run sequentially before the main containers.

Each must succeed before the next starts.

Failure restarts the whole pod according to restartPolicy.

2. Common Use Cases

Wait for dependent service:

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  initContainers:
  - name: wait-for-db
    image: busybox:1.34
    command: ['sh','-c','until nslookup mysql; do echo waiting for mysql; sleep 2; done']
  containers:
  - name: myapp
    image: myapp:1.0

Initialize configuration files:

initContainers:
- name: init-config
  image: busybox:1.34
  command: ['sh','-c','cp /config/*.conf /app/config/']
  volumeMounts:
  - name: config
    mountPath: /config
  - name: app-config
    mountPath: /app/config

Set permissions:

initContainers:
- name: fix-permissions
  image: busybox:1.34
  command: ['sh','-c','chown -R 1000:1000 /data']
  volumeMounts:
  - name: data
    mountPath: /data
  securityContext:
    runAsUser: 0

Troubleshooting Scenarios

1. CrashLoopBackOff

# Observe status
kubectl get pod
# NAME   READY   STATUS           RESTARTS   AGE
# mypod  0/1     CrashLoopBackOff 5          5m

Steps:

View logs: kubectl logs mypod and kubectl logs mypod --previous.

Inspect events: kubectl describe pod mypod | grep -A 20 Events.

Check exit code:

kubectl get pod mypod -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}'

.

Debug with temporary container if the pod exits quickly: kubectl debug mypod -it --image=busybox --target=myapp.

Common causes: application start failure, overly strict health checks, resource exhaustion (OOMKilled), incorrect ENTRYPOINT.

2. Pod Pending

# Check scheduling events
kubectl describe pod mypod | grep -i "fail\|warn"
# Typical reasons:
# - Insufficient CPU/memory
# - PVC not bound
# - Node affinity mismatch
# - Taint/toleration issues

3. Pod Stuck in Terminating

# Inspect finalizers
kubectl get pod mypod -o jsonpath='{.metadata.finalizers}'
# Force delete (use with caution)
kubectl delete pod mypod --force --grace-period=0

Best Practices

Configure probes wisely:

Use startupProbe for slow‑starting apps.

Set reasonable livenessProbe thresholds to avoid false restarts.

Use readinessProbe to keep traffic away from unhealthy pods.

Implement graceful shutdown:

Define preStop to finish in‑flight requests.

Set terminationGracePeriodSeconds ≥ max request time + 10 s.

Leverage init containers for dependency checks and lightweight setup tasks.

Choose appropriate restartPolicy:

Deployments → Always (default).

Jobs → OnFailure or Never.

Debug pods → Never.

Define resource requests/limits (e.g., 256Mi/500m request, 512Mi/1000m limit).

Integrate logging and monitoring (Prometheus metric kube_pod_container_status_restarts_total > 5 for alerts).

Test lifecycle behavior by deleting pods, forcing probe failures, and observing events.

Be aware of version compatibility: startupProbe requires K8s 1.18+, otherwise use initialDelaySeconds.

Avoid zombie processes by using an init system like tini or dumb-init as PID 1.

Document probe endpoints, expected responses, and graceful shutdown procedures.

Kubernetesbest practicesTroubleshootingPod LifecycleProbesInit containersRestartPolicy
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.