Cloud Native 30 min read

Why Pods Are the Most Powerful Unit in Kubernetes – A Deep Dive

This article provides a comprehensive, step‑by‑step analysis of Kubernetes Pods, covering their design as a shared‑namespace container group, the role of the pause (infra) container, creation flow, lifecycle phases, resource requests and limits, QoS classes, scheduling mechanics, volume types, and detailed troubleshooting techniques with concrete command‑line examples.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Pods Are the Most Powerful Unit in Kubernetes – A Deep Dive

Pod Essence and Namespace Sharing

Kubernetes runs workloads inside Pods, which are groups of containers that share the same Linux namespaces (Network, UTS, IPC). This enables localhost communication, a common hostname, and shared IPC while each container keeps its own filesystem.

Typical multi‑container patterns that require a Pod:

Web server + log collector : Nginx writes logs to a shared directory, Filebeat reads and forwards them.

Main service + sidecar : The main application handles business logic, an Envoy sidecar handles network proxy.

Main process + monitoring exporter : A Java process runs in one container, JMX Exporter collects metrics in another.

API gateway + cache : Two containers serve the same request.

Because the containers share the same Network, UTS and IPC namespaces, they can communicate via localhost without extra Services or network bridges.

Namespace Sharing Details

PID : isolated – each container has its own process tree.

Network : shared – all containers use the same Pod IP; localhost works across containers.

UTS : shared – the hostname is the Pod name.

IPC : shared – POSIX shared memory and semaphores are usable.

Mount : optional – shared volumes can be configured.

User : isolated – user namespaces are usually separate.

Pod Network Model

# Show Pod IPs
kubectl get pods -o wide
# Example output
# NAME        READY   STATUS    RESTARTS   AGE   IP           NODE
# myapp-abc   2/2     Running   0          5m    10.244.1.15  node-2

# Verify hostname inside each container
kubectl exec -it myapp-abc -c nginx -- hostname   # => myapp-abc
kubectl exec -it myapp-abc -c sidecar -- hostname # => myapp-abc

# Verify that both containers listen on the same ports
kubectl exec -it myapp-abc -c nginx -- ss -tlnp
kubectl exec -it myapp-abc -c sidecar -- ss -tlnp

Pause (Infra) Container

When a Pod starts, Kubernetes first launches a special Pause container (e.g., k8s.gcr.io/pause:3.6). The Pause container occupies the Linux namespaces so that the business containers can join them. It uses almost no CPU or memory; its PID is 1 inside the Pod. If the Pause container exits, the whole Pod is terminated.

If a Pod has only one business container and it exits, the Pod becomes empty and the control plane cannot detect termination.

The Pause container stays running, holding PID 1 and the network namespace.

Business containers join the namespaces already occupied by the Pause container.

When the Pause container exits, Kubernetes receives a termination notice for the entire Pod.

Pod Creation Flow and Lifecycle

Creation Process (Source‑Level)

User submits Pod Spec
   ↓
API Server stores the Pod object in etcd
   ↓
Scheduler selects a suitable Node (resource fit, affinity, taints, etc.)
   ↓
Kubelet on the chosen Node receives the scheduling instruction
   ↓
Kubelet calls the container runtime (Docker/containerd) to create containers
   ↓
Containers start, Kubelet reports status back to the API Server
   ↓
Pod reaches the Running phase

Key components:

API Server : cluster gateway; all components communicate through it. Pod objects are persisted in etcd.

Scheduler : decides which Node a Pod should run on, considering resources, affinity, taints, etc.

Kubelet : agent on each Node; manages the lifecycle of Pods on that Node and reports status to the API Server.

# View Pod creation events
kubectl describe pod mypod | grep -A 20 "Events:"
# Example events
# Type    Reason      Age   From               Message
# ----    ------      ----  ----               -------
# Normal  Scheduled   10m   default-scheduler  Successfully assigned default/mypod to node-2
# Normal  Pulling     10m   kubelet            Pulling image "nginx:latest"
# Normal  Pulled      9m    kubelet            Successfully pulled image "nginx:latest"
# Normal  Created     9m    kubelet            Created container nginx
# Normal  Started     9m    kubelet            Started container nginx

Pod Phase Status

Pending

: Pod accepted but images not yet pulled or scheduling not finished. Running: Pod bound to a Node, all containers created, at least one is running. Succeeded: All containers terminated successfully (used by Jobs). Failed: All containers terminated and at least one exited with non‑zero status. Unknown: Unable to get Pod status, usually due to Node communication failure.

# Show Pod phase and conditions
kubectl get pod mypod -o yaml | grep -A 10 "status:"
# Example output
# phase: Running
# conditions:
# - type: Ready
#   status: "True"
# - type: ContainersReady
#   status: "True"
# - type: Initialized
#   status: "True"
# startTime: 2026-05-15T08:00:00Z

ContainerState

# Show container states in JSONPath
kubectl get pod mypod -o jsonpath='{range .status.containerStatuses[*]}{.name}: {.state}{"
"}{end}'
# Example output
# nginx: map[running:map[startedAt:2026-05-15T08:00:00Z]]
# sidecar: map[running:map[startedAt:2026-05-15T08:00:00Z]]

Waiting : container not running yet (image pull, dependency wait, etc.).

Running : container is executing.

Terminated : container has finished (either normally or killed).

Init Containers

Init containers run before the main application containers. Common uses:

Wait for a dependent service to become ready.

Pre‑pull configuration or secrets.

Initialize a database schema.

# pod-with-init.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  initContainers:
  - name: wait-for-db
    image: busybox:1.36
    command: ['sh', '-c', 'until nc -z db-service 5432; do echo waiting for db; sleep 2; done']
  - name: setup
    image: busybox:1.36
    command: ['sh', '-c', 'echo "Init done" > /ready']
    volumeMounts:
    - name: shared-data
      mountPath: /ready
  containers:
  - name: app
    image: myapp:v1.0
    volumeMounts:
    - name: shared-data
      mountPath: /data
  volumes:
  - name: shared-data
    emptyDir: {}
Note : If an init container fails (non‑zero exit code), the Pod will not start. The main containers cannot see the init container’s filesystem changes except for shared volumes.

Pod Resource Management

Requests and Limits

Kubernetes uses requests (minimum resources needed for scheduling) and limits (runtime upper bounds). Exceeding a CPU limit triggers throttling; exceeding a memory limit triggers an OOM kill.

# resource-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myapp:v1.0
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"   # 0.25 CPU core
      limits:
        memory: "512Mi"
        cpu: "500m"   # 0.5 CPU core
# Observe throttling counters
kubectl exec -it myapp -- cat /sys/fs/cgroup/cpu/cpu.stat
# nr_throttled 100   # number of times throttled
# throttled_time 5000000000   # total throttled nanoseconds

# Observe OOM kill count
kubectl exec -it myapp -- cat /sys/fs/cgroup/memory/memory.oom_control
# oom_kill 0   # OOM kill count for this cgroup

QoS Classes

Guaranteed : every container sets equal requests and limits for CPU and memory.

Burstable : requests are set, limits are higher (requests < limits).

BestEffort : neither requests nor limits are set.

# Check a Pod's QoS class
kubectl get pod myapp -o jsonpath='{.status.qosClass}'
# Output: Guaranteed / Burstable / BestEffort

# When a Node runs low on memory, eviction order is:
#   1. BestEffort Pods
#   2. Burstable Pods
#   3. Guaranteed Pods

ResourceQuota and LimitRange

# limitrange.yaml – limit per container
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - max:
      memory: 1Gi
      cpu: "1"
    min:
      memory: 64Mi
      cpu: "50m"
    default:
      memory: 256Mi
      cpu: "100m"
    defaultRequest:
      memory: 128Mi
      cpu: "50m"
    type: Container
# View ResourceQuota and LimitRange in a namespace
kubectl get resourcequota -n mynamespace
kubectl get limitrange -n mynamespace

Pod Scheduling Mechanism

Scheduling Flow

Filtering → Scoring → Selection → Binding

Filtering removes Nodes that do not satisfy conditions such as insufficient resources, mismatched NodeSelector/NodeAffinity, taints the Pod cannot tolerate, or unavailable ports/volumes.

Scoring ranks the remaining Nodes, considering factors like least requested resources, most requested (load balancing), topology (rack/zone), and affinity rules.

Node Affinity & Anti‑Affinity

# node-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 10
        preference:
          matchExpressions:
          - key: workload-type
            operator: In
            values:
            - production
  containers:
  - name: app
    image: myapp:v1.0
# Label nodes
kubectl label node node-1 disktype=ssd
kubectl label node node-2 disktype=HDD
# View node labels
kubectl get nodes --show-labels
# Remove a label
kubectl label node node-1 disktype-

Pod Affinity & Anti‑Affinity

# pod-affinity.yaml – co‑locate web server with Redis, avoid same‑node replicas
apiVersion: v1
kind: Pod
metadata:
  name: webserver
spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - redis
          topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - webserver
        topologyKey: kubernetes.io/hostname
  containers:
  - name: nginx
    image: nginx:latest
# View scheduling result
kubectl get pod -o wide -l app=webserver
# Each replica should be on a different Node because of anti‑affinity.

Taints and Tolerations

# Add a taint to a node (prevent regular Pods from scheduling)
kubectl taint node node-1 dedicated=gameserver:NoSchedule
# Taint effects:
#   NoSchedule – new Pods without a matching toleration are blocked.
#   PreferNoSchedule – soft block.
#   NoExecute – blocks new Pods and evicts existing ones.

# Pods that need to run on the tainted node must declare a toleration:
apiVersion: v1
kind: Pod
metadata:
  name: gameserver
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gameserver"
    effect: "NoSchedule"
  containers:
  - name: game
    image: gameserver:v1.0

Pod Volumes

emptyDir (Ephemeral Storage)

# emptyDir-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: nginx
    image: nginx:latest
    volumeMounts:
    - name: shared-cache
      mountPath: /var/cache/nginx
  - name: sync
    image: sync-tool:latest
    volumeMounts:
    - name: shared-cache
      mountPath: /sync
  volumes:
  - name: shared-cache
    emptyDir: {}
    # sizeLimit: 100Mi   # optional size limit

hostPath (Node‑Level Persistent Storage)

# hostPath-pod.yaml (demo only)
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myapp:v1.0
    volumeMounts:
    - name: host-data
      mountPath: /data
  volumes:
  - name: host-data
    hostPath:
      path: /data/hostpath
      type: DirectoryOrCreate   # create if missing
Risk Reminder : hostPath destroys Kubernetes portability. If the Pod is scheduled to a different Node without the same path, it will fail. Production workloads usually use a PersistentVolumeClaim instead.

PersistentVolumeClaim (Persistent Storage)

# pvc-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myapp:v1.0
    volumeMounts:
    - name: data
      mountPath: /var/lib/myapp
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: my-pvc
      readOnly: false
# View PVC status
kubectl get pvc my-pvc
# NAME    STATUS   VOLUME                                 CAPACITY   ACCESS
# my-pvc  Bound    pvc-12345678-abcd-1234-5678-abcdef…   10Gi       RWO

Full‑Cycle Pod Troubleshooting

Pod Stuck in Pending

Cause : scheduling failure – no Node satisfies the requirements.

# Step 1: Inspect events
kubectl describe pod myapp | grep -A 10 "Events"
# Example messages:
# "0/3 nodes are available: 1 Insufficient memory, 2 node(s) had taints that the pod didn't tolerate"

# Step 2: Check Node resources
kubectl describe node | grep -A 5 "Allocated resources"

# Step 3: Examine taints
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.taints[*]}{"
"}{end}'

# Possible remedies:
#   1. Add more Node capacity.
#   2. Adjust requests/limits.
#   3. Add tolerations for existing taints.
#   4. Clean up unused Pods.

Pod in Waiting or ContainerCreating

Cause : image pull failure, missing dependencies, or CNI network issues.

# View detailed events
kubectl describe pod myapp | grep -A 30 "Events"
# Example errors:
# "Failed to pull image \"nginx:latest\": rpc error: code = Unknown desc = context deadline exceeded"
# "networkPlugin cni failed to set up pod network"

# Check kubelet logs on the Node
journalctl -u kubelet --no-pager | grep -E "myapp|error|pod" | tail -50

# Verify image existence
crictl images | grep nginx

Pod in Error or CrashLoopBackOff

Cause : container start failure or runtime crash.

# Show container exit status
kubectl describe pod myapp
# Look for events such as:
# Warning  BackOff  2m (x5 over 5m)  kubelet  Back-off restarting failed container

# View previous container logs
kubectl logs myapp --previous

# Inspect the container's filesystem and environment
kubectl exec -it myapp -- sh
#   ps aux
#   cat /etc/passwd
#   env

# Check if resource limits are too low
kubectl describe pod myapp | grep -E "Limits|Requests"

Typical CrashLoopBackOff reasons:

Application start‑up script errors.

Unable to connect to dependent services (DB, cache).

Missing or malformed configuration files.

Permission problems.

Port conflicts.

Pod Stuck in Terminating

Cause : finalizers block deletion or attached resources (PVC/PV) are not released.

# Inspect finalizers
kubectl get pod mypod -o yaml | grep -A 10 "status:"
# Look for a non‑empty "finalizers" list, e.g., kubernetes.io/pv-protection

# Force delete (use with caution)
kubectl delete pod mypod --grace-period=0 --force

# If PVC is involved, check its status first
kubectl get pvc
kubectl describe pvc my-pvc

Container Health Checks (Probes)

# probes-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myapp:v1.0
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3
    startupProbe:
      httpGet:
        path: /started
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 5
      failureThreshold: 30   # up to 150 seconds total
# Check probe status
kubectl get pod myapp -o jsonpath='{range .status.containerStatuses[*]}{.name}: ready={.ready}, started={.started}{"
"}{end}'
# Test endpoints manually
kubectl exec -it myapp -- curl -s http://localhost:8080/healthz

Core Takeaways

Containers inside a Pod share Network, UTS, and IPC namespaces, enabling localhost communication, shared IP, and shared hostname.

PID namespaces remain isolated; the Pause container occupies PID 1.

A Pod is the atomic scheduling unit; the Scheduler places the whole Pod on a single Node.

The Pause (infra) container is a technical mechanism that lets Kubernetes manage the Pod’s lifecycle.

Resource requests determine scheduling placement, while limits enforce runtime safety boundaries (CPU throttling, memory OOM).

QoS classes (Guaranteed, Burstable, BestEffort) affect eviction order under memory pressure.

Affinity, anti‑affinity, taints, and tolerations provide fine‑grained control over Pod distribution for high availability, low latency, and resource optimization.

Troubleshooting proceeds from Pod phase and events to container state, resource limits, and probe configurations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesResource ManagementSchedulingtroubleshootingNamespacePod
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.