How to Fix Disk‑Full Issues in Legacy Kubernetes Clusters Using Docker
This guide explains why old Kubernetes clusters that use Docker can run out of disk space, describes the symptoms such as pods stuck in ContainerCreating, and provides step‑by‑step commands to clean Docker files, prune images, adjust kubelet settings, and prevent future disk‑full problems.
Old Kubernetes clusters that use Docker as the container runtime often encounter disk‑full problems after running for a long time, with large files accumulating in /var/lib/docker/overlay2, /var/log, or /var/log/journal, which can cause cluster anomalies.
Disk Full
Container runtime directory full
If the directory used by the container runtime runs out of space, Docker commands may hang, kubelet logs show PLEG unhealthy, and Pods remain in ContainerCreating or Terminating state.
Docker default directories
/var/run/docker – stores container runtime state (configurable via --exec-root).
/var/lib/docker – persists container data such as images, writable layers, logs, and volumes.
Symptoms
Typical pod events include:
# pod startup events
Warning FailedCreatePodSandBox 53m kubelet, 172.22.0.44 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedCreatePodSandBox 2m (x4307 over 16h) kubelet, 10.179.80.31 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "apigateway-...": Error response from daemon: mkdir /var/lib/docker/aufs/mnt/...: no space left on device
# pod deletion events
Normal Killing 39s (x735 over 15h) kubelet, 10.179.80.31 Killing container with id docker://apigateway:Need to kill PodKubelet directory full
Kubelet stores its data under /var/lib/kubelet. When this disk fills, new Pods cannot create directories, causing sandbox creation failures and warnings such as:
Symptoms
Warning UnexpectedAdmissionError 44m kubelet, 172.22.0.44 Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.728425055: no space left on device, which is unexpected.Resolution
Manually delete large Docker log or writable‑layer files. Example:
$ cd /var/lib/docker/containers
$ du -sh * # find large directories
$ cd <container-id>
$ cat /dev/null > <container-id>-json.log # truncate log fileUse redirection ( cat /dev/null > file) instead of rm so Docker can release the space.
Delete older logs first (higher numeric suffix indicates older logs).
Mark the node unschedulable and drain its Pods: kubectl drain <node-name> Restart Docker:
systemctl restart dockerd # or systemctl restart dockerAfter Docker restarts, verify Pods are rescheduled, investigate the root cause, and clean up remaining data.
Unmark the node as schedulable.
Clean Docker images
Regularly prune Docker images to free space:
journalctl --vacuum-size=20M # keep journal logs under 20 MB
docker image prune -a --filter "until=24h" # remove images older than 24 h
docker container prune --filter "until=24h" # remove stopped containers older than 24 h
docker volume prune --filter "label!=keep" # remove unused volumes except those labeled keep
docker system prune # remove all unused images, containers, networks, and build cacheKubernetes garbage collection
Kubelet garbage collection automatically removes unused images and containers on a node.
Image reclamation
When disk usage exceeds --image-gc-high-threshold (default 85 %), Kubelet deletes images not referenced by any Pod using an LRU strategy until usage falls below --image-gc-low-threshold (default 80 %). Images younger than --minimum-image-ttl-duration (default 2 min) are retained.
Key parameters:
--image-gc-high-threshold: upper disk usage percent (0‑100, default 85)
--image-gc-low-threshold: lower disk usage percent (0‑100, default 80)
--minimum-image-ttl-duration: minimum image age before it can be reclaimed (default 2m)Example for Kubelet (v1.24): edit /etc/kubernetes/kubelet.env and restart Kubelet.
vim /etc/kubernetes/kubelet.env
systemctl restart kubelet
systemctl status kubelet -flContainer reclamation
Kubelet also cleans up stopped containers, sandbox containers, and log directories based on LRU and configured limits. Only containers managed by Kubelet (Pod containers) are reclaimed.
Key parameters:
--minimum-container-ttl-duration: time after a container stops before it is eligible for GC (default 1m)
--maximum-dead-containers-per-container: max dead containers kept per Pod (default 2)
--maximum-dead-containers: max dead containers on a node (default -1, unlimited)For detailed Kubelet parameter settings, refer to the official Kubernetes documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
