Why Pods Get Evicted: Diagnosing DiskPressure in Kubernetes Nodes
This article walks through a real‑world Kubernetes incident where a node’s disk usage exceeded the eviction threshold, causing pods to enter the Evicted state, and details the investigation steps, root‑cause analysis, and practical remediation actions.
Introduction
Previously we discussed NotReady conditions caused by memory shortage; this post shares a case where insufficient disk space on a host led to pod eviction.
Symptom
An alert indicated a large number of pods were in the Evicted state.
Investigation
All evicted pods were scheduled on the same node. Inside the pod the status showed DiskPressure . Host metrics revealed normal CPU, memory, and load, but disk usage was at 84%.
Viewing the kubelet configuration showed the default eviction thresholds:
<code>cat /etc/kubernetes/kubelet/kubelet-config.json
---
...
"evictionHard": {
"memory.available": "100Mi",
"nodefs.available": "10%",
"nodefs.inodesFree": "5%"
}
...</code>The kubelet process list confirmed the node was running the standard AWS EKS components.
<code>[root@ip-10-153-13-121 ~]# ps -ef| grep kube
root 3226 1 2 Nov03 ? 13:56:53 /usr/bin/kubelet ...
root 3683 3385 0 Nov03 ? 00:08:24 kube-proxy ...
...</code>AWS defaults to evict pods when disk usage exceeds 85%, which matched the observed 84‑85% usage.
Root Cause
The node’s disk is small and the kubelet’s default eviction condition (85% usage) triggered pod eviction.
Because the cluster has few nodes, evicted pods were rescheduled onto the same overloaded node, creating a loop.
The container runtime is containerd , so Docker‑specific cleanup commands are ineffective, and large container snapshots occupied significant space.
Resolution
Cleaning the host disk revealed that only two business pods were running and their disk consumption was minimal. Large container snapshots and unused images were removed, freeing space.
Optimization & Solutions
Customize managed nodes and modify the kubelet eviction condition, raising the disk‑usage threshold from 85% to 95%.
Update the Karpenter node template to provision nodes with a larger disk (e.g., 200 GB) instead of the default 20 GB.
Adjust disk‑monitoring alert thresholds to be lower than the kubelet eviction threshold, allowing early detection of disk pressure.
WeiLi Technology Team
Practicing data-driven principles and believing technology can change the world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.