Cloud Native 10 min read

How to Resolve Disk Full Issues in Legacy Kubernetes Clusters Using Docker

Learn step‑by‑step how to identify and clean up disk‑space exhaustion in older Kubernetes clusters using Docker, including manual log removal, node draining, Docker restart, image pruning, and configuring kubelet garbage‑collection parameters to prevent future outages.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Resolve Disk Full Issues in Legacy Kubernetes Clusters Using Docker

Disk Exhaustion

Older Kubernetes clusters that use Docker as the container runtime often run out of disk space after long operation, with large files accumulating in /var/lib/docker/overlay2 (up to 70 GB) and log files under /var/log or /var/log/journal. If not cleaned promptly, the cluster becomes unstable.

Symptoms When Docker Directories Fill Up

When the Docker storage directory is full, Docker commands may hang, kubelet logs show PLEG unhealthy, and CRI calls time out, causing Pods to stay in ContainerCreating or Terminating state.

Docker default directories

/var/run/docker : stores container runtime state (set via --exec-root ).

/var/lib/docker : persists images, writable layers, logs, and volumes.

Kubelet Directory Exhaustion

Kubelet stores plugin data, Pod status, and mounted volumes under /var/lib/kubelet (configurable with --root-dir). When this directory fills up, new Pods cannot create the required directories, leading to sandbox creation failures and events such as no space left on device.

Resolution Steps

Manually delete large Docker log or writable‑layer files. Example:

$ cd /var/lib/docker/containers
$ du -sh *   # find large directories
$ cd <container-id>
$ cat /dev/null > <container-id>-json.log   # truncate log file

Drain the node to evict Pods: kubectl drain <node-name> Restart Docker (or dockerd) so it can release space:

systemctl restart dockerd   # or systemctl restart docker

After Docker restarts, investigate the root cause, clean up data, and then un‑drain the node.

Un‑drain the node:

kubectl uncordon <node-name>

Docker Image Cleanup

Regularly prune Docker images to avoid disk exhaustion:

journalctl --vacuum-size=20M
docker image prune -a --filter "until=24h"
docker container prune --filter "until=24h"
docker volume prune --filter "label!=keep"
docker system prune

Kubernetes Garbage Collection

Kubelet’s garbage‑collection component automatically removes unused images and containers when disk usage exceeds configured thresholds.

Image Reclamation

When disk usage surpasses image-gc-high-threshold (default 85 %), Kubelet deletes unused images using an LRU policy until usage drops below image-gc-low-threshold (default 80 %). Images younger than minimum-image-ttl-duration (default 2 min) are retained.

--image-gc-high-threshold: upper disk usage limit (0‑100, default 85)
--image-gc-low-threshold: lower disk usage limit (0‑100, default 80)
--minimum-image-ttl-duration: minimum image age (default 2m)

Container Reclamation

Kubelet also cleans up dead containers, sandbox containers, and orphaned log directories based on LRU settings:

--minimum-container-ttl-duration: time after stop before a container is eligible for GC (default 1m)
--maximum-dead-containers-per-container: max dead containers kept per Pod (default 2)
--maximum-dead-containers: max dead containers on node (default -1, unlimited)

Configuration Example (K8s 1.24)

Edit /etc/kubernetes/kubelet.env to add the desired flags, then restart kubelet:

systemctl restart kubelet
systemctl status kubelet -fl

For detailed parameter descriptions, refer to the official Kubernetes documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerGarbage CollectionKubeletDisk CleanupNode Maintenance
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.