Operations 11 min read

How to Fix Disk‑Full Issues in Legacy Kubernetes Clusters Using Docker

This guide explains why old Kubernetes clusters that use Docker can run out of disk space, describes the symptoms such as pods stuck in ContainerCreating, and provides step‑by‑step commands to clean Docker files, prune images, adjust kubelet settings, and prevent future disk‑full problems.

Open Source Linux

Mar 7, 2024

Old Kubernetes clusters that use Docker as the container runtime often encounter disk‑full problems after running for a long time, with large files accumulating in /var/lib/docker/overlay2, /var/log, or /var/log/journal, which can cause cluster anomalies.

Disk Full

Container runtime directory full

If the directory used by the container runtime runs out of space, Docker commands may hang, kubelet logs show PLEG unhealthy, and Pods remain in ContainerCreating or Terminating state.

Docker default directories

/var/run/docker – stores container runtime state (configurable via --exec-root).

/var/lib/docker – persists container data such as images, writable layers, logs, and volumes.

Symptoms

Typical pod events include:

# pod startup events
Warning  FailedCreatePodSandBox 53m kubelet, 172.22.0.44  Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning  FailedCreatePodSandBox  2m (x4307 over 16h)  kubelet, 10.179.80.31  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "apigateway-...": Error response from daemon: mkdir /var/lib/docker/aufs/mnt/...: no space left on device
# pod deletion events
Normal  Killing  39s (x735 over 15h)  kubelet, 10.179.80.31  Killing container with id docker://apigateway:Need to kill Pod

Kubelet directory full

Kubelet stores its data under /var/lib/kubelet. When this disk fills, new Pods cannot create directories, causing sandbox creation failures and warnings such as:

Symptoms

Warning  UnexpectedAdmissionError  44m kubelet, 172.22.0.44  Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.728425055: no space left on device, which is unexpected.

Resolution

Manually delete large Docker log or writable‑layer files. Example:

$ cd /var/lib/docker/containers
$ du -sh * # find large directories
$ cd <container-id>
$ cat /dev/null > <container-id>-json.log   # truncate log file

Use redirection ( cat /dev/null > file) instead of rm so Docker can release the space.

Delete older logs first (higher numeric suffix indicates older logs).

Mark the node unschedulable and drain its Pods: kubectl drain <node-name> Restart Docker:

systemctl restart dockerd   # or systemctl restart docker

After Docker restarts, verify Pods are rescheduled, investigate the root cause, and clean up remaining data.

Unmark the node as schedulable.

Clean Docker images

Regularly prune Docker images to free space:

journalctl --vacuum-size=20M   # keep journal logs under 20 MB
docker image prune -a --filter "until=24h"   # remove images older than 24 h
docker container prune --filter "until=24h"   # remove stopped containers older than 24 h
docker volume prune --filter "label!=keep"   # remove unused volumes except those labeled keep
docker system prune   # remove all unused images, containers, networks, and build cache

Kubernetes garbage collection

Kubelet garbage collection automatically removes unused images and containers on a node.

Image reclamation

When disk usage exceeds --image-gc-high-threshold (default 85 %), Kubelet deletes images not referenced by any Pod using an LRU strategy until usage falls below --image-gc-low-threshold (default 80 %). Images younger than --minimum-image-ttl-duration (default 2 min) are retained.

Key parameters:

--image-gc-high-threshold: upper disk usage percent (0‑100, default 85)
--image-gc-low-threshold: lower disk usage percent (0‑100, default 80)
--minimum-image-ttl-duration: minimum image age before it can be reclaimed (default 2m)

Example for Kubelet (v1.24): edit /etc/kubernetes/kubelet.env and restart Kubelet.

vim /etc/kubernetes/kubelet.env
systemctl restart kubelet
systemctl status kubelet -fl

Container reclamation

Kubelet also cleans up stopped containers, sandbox containers, and log directories based on LRU and configured limits. Only containers managed by Kubelet (Pod containers) are reclaimed.

Key parameters:

--minimum-container-ttl-duration: time after a container stops before it is eligible for GC (default 1m)
--maximum-dead-containers-per-container: max dead containers kept per Pod (default 2)
--maximum-dead-containers: max dead containers on a node (default -1, unlimited)

For detailed Kubelet parameter settings, refer to the official Kubernetes documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Garbage Collection kubelet Disk Cleanup

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.