Master Linux Disk Management & I/O Performance: A Hands‑On Guide from Expansion to Tuning
This comprehensive guide walks you through Linux disk space shortage scenarios, prerequisites, a quick checklist, step‑by‑step LVM and partition expansion, I/O scheduler tuning, fio benchmarking, kernel parameter optimization, Prometheus monitoring, security hardening, backup strategies, troubleshooting, and best‑practice recommendations for reliable disk management and performance.
Applicable Scenarios & Prerequisites
Production servers with low free space, I/O bottlenecks, database/storage workloads, or container platforms that need persistent volumes.
Supported OS: RHEL/CentOS 7‑9, Ubuntu 18.04‑24.04.
Root or sudo privileges.
Required tools: parted, lvm2, xfsprogs / e2fsprogs, fio, iostat (sysstat package).
Backup all critical data and take snapshots before any modification.
Perform expansion during a low‑traffic maintenance window.
Environment & Version Matrix
Kernel: 3.10+ (recommended 4.18+ for RHEL/CentOS, 4.15+ (recommended 5.4+) for Ubuntu/Debian).
LVM version 2.02+ on all platforms.
Default filesystem: XFS on RHEL/CentOS, ext4 on Ubuntu (both supported).
Minimum IOPS: HDD ≥100 IOPS, SSD ≥3000 IOPS.
Reserve at least 10 % free space before expansion.
Quick Checklist
Inspect current partitions and usage.
Identify disk type (HDD/SSD/NVMe) and current I/O scheduler.
Create LVM physical volume, volume group, and logical volume if needed.
Expand the filesystem online (XFS with xfs_growfs, ext4 with resize2fs).
Set an appropriate I/O scheduler (e.g., none for SSD, mq-deadline for HDD).
Run fio benchmarks to verify IOPS and throughput.
Configure Prometheus node_exporter alerts for disk space, inode usage, and I/O utilization.
Apply disk quotas and tighten permission controls.
Implement log cleanup and archiving policies.
Prepare LVM snapshots and rollback plans.
Implementation Steps
Step 1 – Diagnose Disk Layout
# List block devices and filesystems
lsblk -f
fdisk -l | grep "Disk /dev"
# Show usage and inode statistics
df -hT
df -i
# Find large directories
du -sh /* | sort -hr | head -10
du -h --max-depth=2 /var | sort -hr | head -20Key fields: FSTYPE – determines whether to use xfs_growfs (XFS) or resize2fs (ext4). SIZE vs MOUNTPOINT – reveals unallocated or unmounted space. df -i – inode usage >85 % requires cleanup of many small files.
Step 2 – Identify Disk Type & I/O Scheduler
# Detect SSD/NVMe (rotational=0 means SSD)
lsblk -d -o NAME,ROTA,DISC-GRAN
cat /sys/block/sda/queue/rotational # 0=SSD, 1=HDD
# Show current scheduler
cat /sys/block/sda/queue/scheduler # brackets indicate activeRecommended scheduler:
HDD (ROTA=1): deadline or cfq SSD/NVMe (ROTA=0): none (or noop) or mq-deadline Temporary change:
# echo none > /sys/block/nvme0n1/queue/schedulerPersist via udev (RHEL/CentOS example):
cat > /etc/udev/rules.d/60-ioscheduler.rules <<'EOF'
# SSD/NVMe use none
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
# HDD use mq-deadline
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="mq-deadline"
EOF
udevadm control --reload-rules && udevadm triggerStep 3 – LVM Expansion (Online, No Downtime)
Scenario: add a new 100 GB disk /dev/sdb to extend /var.
# Show existing physical volumes
pvdisplay
# Create PV on the new disk
pvcreate /dev/sdb
pvdisplay /dev/sdb # verify size
# Extend the volume group
vgdisplay
vgextend vg0 /dev/sdb
vgdisplay vg0 # free PE should increase
# Extend the logical volume (example +50 GB)
lvextend -L +50G /dev/vg0/var # or -l +100%FREE
# Grow the filesystem
# XFS
xfs_growfs /var
# ext4
resize2fs /dev/vg0/var
# Verify
df -h /varRollback (create snapshot before expansion):
# Snapshot
lvcreate -L 10G -s -n var-snapshot /dev/vg0/var
# If needed, merge back
lvconvert --merge /dev/vg0/var-snapshotStep 4 – Partition Expansion (Non‑LVM)
Scenario: cloud VM system disk /dev/sda3 needs to be enlarged.
# Install growpart tool
# RHEL/CentOS
yum install -y cloud-utils-growpart
# Ubuntu
apt install -y cloud-guest-utils
# Grow partition 3 without data loss
growpart /dev/sda 3
partprobe /dev/sda
# Expand filesystem
# XFS
xfs_growfs /
# ext4
resize2fs /dev/sda3
# Manual method with parted (dangerous – backup first)
parted /dev/sda
(parted) print free
(parted) resizepart 3 100%
(parted) quit
partprobe /dev/sda
# Then run the appropriate filesystem grow commandStep 5 – I/O Performance Benchmark & Tuning
# Install fio
# RHEL/CentOS
yum install -y fio
# Ubuntu
apt install -y fioSequential write (4 MiB block, 10 GiB file):
fio --name=seqwrite --rw=write --bs=4M --size=10G \
--numjobs=1 --runtime=60 --time_based \
--directory=/var/fio-test --ioengine=libaio --iodepth=16 \
--direct=1 --group_reportingSequential read:
fio --name=seqread --rw=read --bs=4M --size=10G \
--numjobs=1 --runtime=60 --time_based \
--directory=/var/fio-test --ioengine=libaio --iodepth=16 \
--direct=1 --group_reportingRandom read/write (4 KiB block, 4 jobs):
# Random write
fio --name=randwrite --rw=randwrite --bs=4K --size=10G \
--numjobs=4 --runtime=60 --time_based \
--directory=/var/fio-test --ioengine=libaio --iodepth=32 \
--direct=1 --group_reporting
# Random read
fio --name=randread --rw=randread --bs=4K --size=10G \
--numjobs=4 --runtime=60 --time_based \
--directory=/var/fio-test --ioengine=libaio --iodepth=32 \
--direct=1 --group_reportingTarget metrics:
HDD – sequential 100‑200 MB/s, random 100‑300 IOPS.
SATA SSD – sequential 500‑550 MB/s, random 50K‑90K IOPS.
NVMe SSD – sequential 2‑7 GB/s, random 200K‑1M IOPS.
# Cleanup test files
rm -rf /var/fio-testStep 6 – Kernel Parameter Tuning
# Append to /etc/sysctl.conf
cat >> /etc/sysctl.conf <<'EOF'
# Reduce swap usage (DB servers often set to 10)
vm.swappiness = 10
# Dirty page ratios (higher for SSD)
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
# Faster writeback
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
# Max file handles for high concurrency
fs.file-max = 2097152
EOF
sysctl -pMount options for XFS (reduce metadata writes):
# Edit /etc/fstab
# Original line (example):
# /dev/mapper/vg0-var /var xfs defaults 0 0
# Optimized line:
/dev/mapper/vg0-var /var xfs defaults,noatime,nodiratime 0 0
mount -o remount /var
mount | grep /var # should show noatime,nodiratimeStep 7 – Disk Cleanup & Capacity Management
# Clean systemd journal (keep last 7 days, max 1 GB)
journalctl --vacuum-time=7d
journalctl --vacuum-size=1G
# Remove old kernels (RHEL/CentOS)
yum install -y yum-utils
package-cleanup --oldkernels --count=2
# Clean apt cache (Ubuntu)
apt clean
apt autoclean
apt autoremove --purge
# Clean Docker (if used)
docker system prune -af --volumesFind large files:
# Files >1 GB
find /var -type f -size +1G -exec ls -lh {} \; | sort -k5 -hr
# Files >100 MB not accessed in 7 days
find /var/log -type f -size +100M -atime +7Set XFS quota for a user (example appuser limited to 50 GB):
# Enable quota in /etc/fstab
/dev/mapper/vg0-var /var xfs defaults,uquota,gquota 0 0
mount -o remount /var
# Apply quota
xfs_quota -x -c 'limit bsoft=45G bhard=50G appuser' /var
xfs_quota -x -c 'report -h' /varMonitoring & Alerts
Prometheus Metrics
# Download and start node_exporter (v1.8.2 example)
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xf node_exporter-*.tar.gz
cd node_exporter-*/
./node_exporter &Key PromQL alerts (thresholds shown):
# Disk usage >85%
(1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 85
# Inode usage >90%
(1 - node_filesystem_files_free{mountpoint=~"/|/var"} / node_filesystem_files{mountpoint=~"/|/var"}) * 100 > 90
# I/O utilization >80%
rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
# 99th‑percentile read latency >100 ms
histogram_quantile(0.99, rate(node_disk_read_time_seconds_total[5m])) > 0.1Native Monitoring Commands
# I/O stats every 2 s
iostat -xm 2
# Show only processes doing I/O (requires root)
iotop -o
# Check I/O wait in top (wa% >20% indicates bottleneck)
topSuggested alert thresholds:
Disk usage >85 % → start cleanup.
Inode usage >90 % → delete small files.
I/O wait >20 % → investigate scheduler & application.
Average queue depth >10 → I/O saturation.
Performance & Capacity
Parameter Tuning Summary
I/O scheduler – SSD: none or mq-deadline; HDD: mq-deadline.
vm.swappiness – SSD: 10, HDD: 60 (reduce swap for DB workloads).
vm.dirty_ratio – SSD: 15, HDD: 10 (higher dirty pages on SSD).
Readahead – SSD: 256‑512 KB; HDD: 1024‑2048 KB.
Mount options – SSD: noatime,nodiratime; HDD: defaults.
Set readahead (example for /dev/sda):
# Show current value
blockdev --getra /dev/sda
# Set to 512 sectors (256 KB)
blockdev --setra 512 /dev/sda
# Persist via /etc/rc.local
echo 'blockdev --setra 512 /dev/sda' >> /etc/rc.local
chmod +x /etc/rc.localCapacity Planning
OS disk – keep 15 % free.
Database disk – keep 20 % free for temp sorting & backups.
Log disk – rotate logs, retain 30 days.
Container storage – auto‑clean unused images & volumes.
Expansion triggers:
Disk usage reaches 80 % → start expansion request.
Projected 90 % within 30 days → urgent expansion.
IOPS sustained >80 % utilization → upgrade disk tier.
Security & Compliance
Permission hardening:
# Restrict MySQL data directory
chmod 700 /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
# Restrict log directory
chmod 750 /var/log
chown root:adm /var/logAuditd monitoring (example):
# Watch critical paths
auditctl -w /var/lib/mysql -p wa -k mysql_data_change
auditctl -w /etc/fstab -p wa -k fstab_change
# Query audit logs
ausearch -k mysql_data_changeData‑at‑rest encryption with LUKS (new disk example):
# Create encrypted LUKS container
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_disk
mkfs.xfs /dev/mapper/encrypted_diskBackup strategy:
Create LVM snapshots before any expansion.
Full weekly backups retained 4 weeks.
Daily incremental backups retained 7 days.
Common Failures & Troubleshooting
Disk full (df 100 %) – Diagnose with du -sh /* | sort -hr; clean logs via journalctl --vacuum-size=1G; configure log rotation to prevent recurrence.
Inode exhaustion – Check with df -i and locate directories containing many small files; delete caches or adjust filesystem layout.
High I/O wait – Use iostat -x 1 and iotop -o to identify offending processes; rate‑limit or pause heavy jobs, then tune the scheduler.
XFS expansion failure – Run xfs_info /mount to verify; if corruption suspected, run xfs_repair -n /dev/vg0/var (dry‑run) then repair on unmounted filesystem.
LVM snapshot full – Check with lvs -a; extend snapshot size via lvextend -L +5G /dev/vg0/snap or allocate a larger snapshot initially.
Filesystem mounted read‑only – Inspect kernel messages with dmesg | grep -i error; remount read‑write with mount -o remount,rw / and address underlying disk errors.
Urgent full‑disk handling example:
# 1. Locate large directories
du -sh /* | sort -hr | head -5
# 2. Clean old logs
journalctl --vacuum-time=1d
find /var/log -name "*.log" -mtime +7 -delete
# 3. Prune Docker (if present)
docker system prune -af
# 4. Temporary LV extension if free space exists
lvextend -L +10G /dev/vg0/var && xfs_growfs /var
# 5. Verify
df -h /varChange & Rollback Playbooks
Pre‑Change Checklist
# 1. Backup critical data
tar czf /backup/var-$(date +%F).tar.gz /var/important-data
# 2. Create LVM snapshot (if applicable)
lvcreate -L 10G -s -n var-snapshot-$(date +%F) /dev/vg0/var
# 3. Record current state
df -h > /root/df-before.txt
lsblk > /root/lsblk-before.txt
pvs && vgs && lvs > /root/lvm-before.txt
# 4. Check disk health (SMART)
smartctl -H /dev/sda
smartctl -A /dev/sda | grep -i "reallocated\|pending\|uncorrectable"Expansion Execution Script (bash, idempotent)
#!/bin/bash
set -euo pipefail
NEW_DISK="/dev/sdb"
VG_NAME="vg0"
LV_NAME="var"
EXTEND_SIZE="+50G"
# Create PV if missing
if ! pvdisplay "$NEW_DISK" &>/dev/null; then
echo "Creating PV $NEW_DISK"
pvcreate "$NEW_DISK"
else
echo "PV $NEW_DISK already exists"
fi
# Extend VG if disk not present
if ! vgdisplay "$VG_NAME" | grep -q "$NEW_DISK"; then
echo "Extending VG $VG_NAME with $NEW_DISK"
vgextend "$VG_NAME" "$NEW_DISK"
else
echo "VG $VG_NAME already contains $NEW_DISK"
fi
# Extend LV
echo "Extending LV /dev/$VG_NAME/$LV_NAME by $EXTEND_SIZE"
lvextend -L $EXTEND_SIZE "/dev/$VG_NAME/$LV_NAME"
# Grow filesystem based on type
MOUNT_POINT=$(findmnt -n -o TARGET --source "/dev/$VG_NAME/$LV_NAME")
FS_TYPE=$(findmnt -n -o FSTYPE --source "/dev/$VG_NAME/$LV_NAME")
if [[ "$FS_TYPE" == "xfs" ]]; then
echo "Growing XFS on $MOUNT_POINT"
xfs_growfs "$MOUNT_POINT"
elif [[ "$FS_TYPE" == "ext4" ]]; then
echo "Growing ext4 on /dev/$VG_NAME/$LV_NAME"
resize2fs "/dev/$VG_NAME/$LV_NAME"
fi
# Verify
df -h "$MOUNT_POINT"
echo "Expansion completed"Rollback Scenarios
Filesystem expansion failure – Unmount, merge snapshot, remount:
umount /var
lvconvert --merge /dev/vg0/var-snapshot
mount /varAccidental PV removal – Restore LVM metadata from backup:
vgcfgrestore -l vg0 # list backups
vgcfgrestore -f /etc/lvm/archive/vg0_XXXXX.vg vg0
vgchange -ay vg0Disk failure – Migrate data off the failed PV and remove it:
pvmove /dev/sdb # move data to other PVs
vgreduce vg0 /dev/sdb
pvremove /dev/sdbBest Practices
Use LVM as the default storage layout for new servers – simplifies future expansion and snapshotting.
Plan separate partitions (e.g., /var, /var/log, /home) to avoid a single point of exhaustion.
Monitor disk metrics before and after any change; ensure Prometheus alerts return to normal.
Always create an LVM snapshot before modifications; size the snapshot at least twice the expected write volume during the operation.
Match I/O scheduler to media: SSD → none, HDD → mq-deadline.
Automate log cleanup via cron (e.g., journalctl --vacuum-time=30d).
Set capacity thresholds: 80 % warning, 85 % alert, 90 % urgent.
Prefer XFS for databases (large files, high throughput); ext4 is acceptable for general workloads.
Avoid online LV shrinking; migrate data to a new LV instead.
Test snapshot restore and backup recovery at least quarterly.
Appendix
A. Idempotent LVM Expansion Script
#!/bin/bash
# Usage: ./lvm_extend.sh /dev/sdb vg0 var +50G
set -euo pipefail
NEW_DISK=$1
VG_NAME=$2
LV_NAME=$3
EXTEND_SIZE=$4
# Ensure PV exists
if pvdisplay "$NEW_DISK" &>/dev/null; then
echo "$NEW_DISK already a PV"
else
pvcreate "$NEW_DISK"
fi
# Ensure VG contains the PV
if vgdisplay "$VG_NAME" | grep -q "$NEW_DISK"; then
echo "$NEW_DISK already in VG $VG_NAME"
else
vgextend "$VG_NAME" "$NEW_DISK"
fi
# Extend LV
lvextend -L $EXTEND_SIZE "/dev/$VG_NAME/$LV_NAME"
# Detect mount point and FS type
MOUNT_POINT=$(findmnt -n -o TARGET --source "/dev/$VG_NAME/$LV_NAME")
FS_TYPE=$(findmnt -n -o FSTYPE --source "/dev/$VG_NAME/$LV_NAME")
if [[ "$FS_TYPE" == "xfs" ]]; then
xfs_growfs "$MOUNT_POINT"
elif [[ "$FS_TYPE" == "ext4" ]]; then
resize2fs "/dev/$VG_NAME/$LV_NAME"
fi
df -h "$MOUNT_POINT"
echo "LVM expansion completed"B. fio Test Configuration (fio-test.ini)
[global]
ioengine=libaio
direct=1
iodepth=32
time_based
runtime=60
group_reporting
directory=/var/fio-test
[seqwrite]
rw=write
bs=4M
numjobs=1
stonewall
[seqread]
rw=read
bs=4M
numjobs=1
stonewall
[randwrite]
rw=randwrite
bs=4K
numjobs=4
stonewall
[randread]
rw=randread
bs=4K
numjobs=4
stonewallC. Prometheus Alert Rules (prometheus-disk-alerts.yml)
groups:
- name: disk_alerts
interval: 30s
rules:
- alert: DiskSpaceHigh
expr: (1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Disk usage > 85% (instance: {{ $labels.instance }})"
description: "{{ $labels.mountpoint }} usage is {{ $value }}%"
- alert: DiskSpaceCritical
expr: (1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 90
for: 2m
labels:
severity: critical
annotations:
summary: "Disk usage > 90% (instance: {{ $labels.instance }})"
- alert: InodeUsageHigh
expr: (1 - node_filesystem_files_free / node_filesystem_files) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Inode usage > 90% (instance: {{ $labels.instance }})"
- alert: DiskIOHigh
expr: rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
for: 10m
labels:
severity: warning
annotations:
summary: "Disk I/O utilization > 80% (instance: {{ $labels.instance }})"D. Disk Health Check Script
#!/bin/bash
# SMART health check for all SATA and NVMe disks
for disk in /dev/sd? /dev/nvme?n?; do
[[ -e $disk ]] || continue
echo "=== Checking $disk ==="
# Overall health
smartctl -H $disk | grep -i "SMART overall-health"
# Key attributes
smartctl -A $disk | grep -E "Reallocated_Sector|Current_Pending_Sector|Offline_Uncorrectable"
if [[ $disk == *nvme* ]]; then
smartctl -A $disk | grep -E "Media_Errors|Percentage_Used"
fi
echo ""
doneRaymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
