How to Quickly Diagnose and Resolve Disk Space Exhaustion in Production
This guide walks through a step‑by‑step process for identifying the partitions and files that fill a disk, applying temporary fixes to bring usage below critical levels, and implementing long‑term measures to prevent future disk‑full incidents in production environments.
Introduction
Recently I encountered a production issue where the disk was almost full and share the troubleshooting steps.
1. Check Disk Usage and Identify the Culprit
1.1 View Disk Usage
Use df -h to see usage percentages of mounted partitions.
# View usage of all mounted partitions (focus on Use% column)
df -hInterpret the output, paying attention to the Mounted on and Use% columns. For example, a root partition showing 92% usage is a priority.
1.2 Locate Largest Directories
Navigate to the overloaded partition and run du -h --max-depth=1 to list the size of top‑level directories.
# Change to root of overloaded partition
cd /
# Show size of each subdirectory (human‑readable, summarize, depth 1)
du -h --max-depth=1Example output:
18G /var
10G /usr
5G /home
1.5G /optDrill down further into the biggest directory, e.g., /var:
# Enter /var and list second‑level directories
cd /var
du -h --max-depth=1
# If /var/log is biggest, continue:
cd /var/log
du -h --max-depth=11.3 Find Large Files
Search for files larger than 100 MB, especially Java logs or caches.
# Find files >100M under /var, sort by size
find /var -type f -size +100M -exec ls -lh {} \; | sort -rh -k52. Analyse Root Causes
2.1 Uncontrolled Log Growth
Logs written without size limits are a common cause. Typical reasons include:
Log level set to DEBUG.
Repeated errors or exception loops causing massive output.
Third‑party component logs (Tomcat, Nginx, MySQL) not being cleaned.
2.2 Java Temporary Files and Caches
Java processes may generate temporary files, cache data, or heap dump files that grow unchecked.
Temporary files in tmp/ (e.g., tomcat‑tmp, jdk‑tmp).
Application cache files such as Redis persistence, Elasticsearch data, local caches.
JVM heap dump files (*.hprof) created after OOM, each can be several gigabytes.
Root causes are missing cleanup code, unattended heap dumps, and poor cache eviction policies.
3. Temporary Mitigation
After identifying the main consumer, reduce usage below 80 % to keep the system stable.
If large log files are the culprit, back up if needed, then truncate or delete them.
# Delete app.log larger than 500M
find /var/log/app/ -name "app.log" -size +500M -exec rm -f {} \;If deleted files are still held open (e.g., Java processes), list them with lsof | grep deleted:
# List deleted but still open files
lsof | grep deleted | awk '{print $2, $7, $9}' | sort -rh -k2Remove old temporary files, for example files older than 7 days in /tmp:
# Remove tmp files older than 7 days
find /tmp/ -name "*-tmp-*" -mtime +7 -deleteIf nothing can be removed quickly, consider attaching an extra disk or expanding the volume.
4. Root‑Cause Optimisation to Prevent Recurrence
4.1 Log Management
Adjust log level to INFO or WARN.
Deploy log‑rotation and cleanup scripts.
4.2 Code and System Practices
Fix bugs that cause infinite loops or excessive logging.
Ensure temporary files are deleted in finally blocks.
Configure JVM to write heap dumps to a dedicated large partition and limit generation (e.g., -XX:HeapDumpPath=/data/heapdump/).
Monitor critical directories and trigger alerts when size thresholds are exceeded.
5. Post‑mortem and Knowledge‑Sharing
Record the root cause of the disk‑full incident.
Verify that all Java applications have log rotation enabled.
Add monitoring metrics for log file size, temporary directory size, and heap‑dump generation.
Document the troubleshooting steps and incorporate them into the team’s incident‑response handbook.
In the reported case, a forgotten debug log caused the disk to fill; after backing up and deleting the log and fixing the code, the issue was resolved.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
