Operations 29 min read

Master Linux Disk Management & I/O Performance: A Hands‑On Guide from Expansion to Tuning

This comprehensive guide walks you through Linux disk space shortage scenarios, prerequisites, a quick checklist, step‑by‑step LVM and partition expansion, I/O scheduler tuning, fio benchmarking, kernel parameter optimization, Prometheus monitoring, security hardening, backup strategies, troubleshooting, and best‑practice recommendations for reliable disk management and performance.

Raymond Ops
Raymond Ops
Raymond Ops
Master Linux Disk Management & I/O Performance: A Hands‑On Guide from Expansion to Tuning

Applicable Scenarios & Prerequisites

Production servers with low free space, I/O bottlenecks, database/storage workloads, or container platforms that need persistent volumes.

Supported OS: RHEL/CentOS 7‑9, Ubuntu 18.04‑24.04.

Root or sudo privileges.

Required tools: parted, lvm2, xfsprogs / e2fsprogs, fio, iostat (sysstat package).

Backup all critical data and take snapshots before any modification.

Perform expansion during a low‑traffic maintenance window.

Environment & Version Matrix

Kernel: 3.10+ (recommended 4.18+ for RHEL/CentOS, 4.15+ (recommended 5.4+) for Ubuntu/Debian).

LVM version 2.02+ on all platforms.

Default filesystem: XFS on RHEL/CentOS, ext4 on Ubuntu (both supported).

Minimum IOPS: HDD ≥100 IOPS, SSD ≥3000 IOPS.

Reserve at least 10 % free space before expansion.

Quick Checklist

Inspect current partitions and usage.

Identify disk type (HDD/SSD/NVMe) and current I/O scheduler.

Create LVM physical volume, volume group, and logical volume if needed.

Expand the filesystem online (XFS with xfs_growfs, ext4 with resize2fs).

Set an appropriate I/O scheduler (e.g., none for SSD, mq-deadline for HDD).

Run fio benchmarks to verify IOPS and throughput.

Configure Prometheus node_exporter alerts for disk space, inode usage, and I/O utilization.

Apply disk quotas and tighten permission controls.

Implement log cleanup and archiving policies.

Prepare LVM snapshots and rollback plans.

Implementation Steps

Step 1 – Diagnose Disk Layout

# List block devices and filesystems
lsblk -f
fdisk -l | grep "Disk /dev"

# Show usage and inode statistics
df -hT
df -i

# Find large directories
du -sh /* | sort -hr | head -10
du -h --max-depth=2 /var | sort -hr | head -20

Key fields: FSTYPE – determines whether to use xfs_growfs (XFS) or resize2fs (ext4). SIZE vs MOUNTPOINT – reveals unallocated or unmounted space. df -i – inode usage >85 % requires cleanup of many small files.

Step 2 – Identify Disk Type & I/O Scheduler

# Detect SSD/NVMe (rotational=0 means SSD)
lsblk -d -o NAME,ROTA,DISC-GRAN
cat /sys/block/sda/queue/rotational   # 0=SSD, 1=HDD

# Show current scheduler
cat /sys/block/sda/queue/scheduler   # brackets indicate active

Recommended scheduler:

HDD (ROTA=1): deadline or cfq SSD/NVMe (ROTA=0): none (or noop) or mq-deadline Temporary change:

# echo none > /sys/block/nvme0n1/queue/scheduler

Persist via udev (RHEL/CentOS example):

cat > /etc/udev/rules.d/60-ioscheduler.rules <<'EOF'
# SSD/NVMe use none
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
# HDD use mq-deadline
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="mq-deadline"
EOF
udevadm control --reload-rules && udevadm trigger

Step 3 – LVM Expansion (Online, No Downtime)

Scenario: add a new 100 GB disk /dev/sdb to extend /var.

# Show existing physical volumes
pvdisplay

# Create PV on the new disk
pvcreate /dev/sdb
pvdisplay /dev/sdb   # verify size

# Extend the volume group
vgdisplay
vgextend vg0 /dev/sdb
vgdisplay vg0   # free PE should increase

# Extend the logical volume (example +50 GB)
lvextend -L +50G /dev/vg0/var   # or -l +100%FREE

# Grow the filesystem
# XFS
xfs_growfs /var
# ext4
resize2fs /dev/vg0/var

# Verify
df -h /var

Rollback (create snapshot before expansion):

# Snapshot
lvcreate -L 10G -s -n var-snapshot /dev/vg0/var
# If needed, merge back
lvconvert --merge /dev/vg0/var-snapshot

Step 4 – Partition Expansion (Non‑LVM)

Scenario: cloud VM system disk /dev/sda3 needs to be enlarged.

# Install growpart tool
# RHEL/CentOS
yum install -y cloud-utils-growpart
# Ubuntu
apt install -y cloud-guest-utils

# Grow partition 3 without data loss
growpart /dev/sda 3
partprobe /dev/sda

# Expand filesystem
# XFS
xfs_growfs /
# ext4
resize2fs /dev/sda3

# Manual method with parted (dangerous – backup first)
parted /dev/sda
(parted) print free
(parted) resizepart 3 100%
(parted) quit
partprobe /dev/sda
# Then run the appropriate filesystem grow command

Step 5 – I/O Performance Benchmark & Tuning

# Install fio
# RHEL/CentOS
yum install -y fio
# Ubuntu
apt install -y fio

Sequential write (4 MiB block, 10 GiB file):

fio --name=seqwrite --rw=write --bs=4M --size=10G \
    --numjobs=1 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=16 \
    --direct=1 --group_reporting

Sequential read:

fio --name=seqread --rw=read --bs=4M --size=10G \
    --numjobs=1 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=16 \
    --direct=1 --group_reporting

Random read/write (4 KiB block, 4 jobs):

# Random write
fio --name=randwrite --rw=randwrite --bs=4K --size=10G \
    --numjobs=4 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=32 \
    --direct=1 --group_reporting
# Random read
fio --name=randread --rw=randread --bs=4K --size=10G \
    --numjobs=4 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=32 \
    --direct=1 --group_reporting

Target metrics:

HDD – sequential 100‑200 MB/s, random 100‑300 IOPS.

SATA SSD – sequential 500‑550 MB/s, random 50K‑90K IOPS.

NVMe SSD – sequential 2‑7 GB/s, random 200K‑1M IOPS.

# Cleanup test files
rm -rf /var/fio-test

Step 6 – Kernel Parameter Tuning

# Append to /etc/sysctl.conf
cat >> /etc/sysctl.conf <<'EOF'
# Reduce swap usage (DB servers often set to 10)
vm.swappiness = 10
# Dirty page ratios (higher for SSD)
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
# Faster writeback
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
# Max file handles for high concurrency
fs.file-max = 2097152
EOF
sysctl -p

Mount options for XFS (reduce metadata writes):

# Edit /etc/fstab
# Original line (example):
# /dev/mapper/vg0-var /var xfs defaults 0 0
# Optimized line:
/dev/mapper/vg0-var /var xfs defaults,noatime,nodiratime 0 0
mount -o remount /var
mount | grep /var   # should show noatime,nodiratime

Step 7 – Disk Cleanup & Capacity Management

# Clean systemd journal (keep last 7 days, max 1 GB)
journalctl --vacuum-time=7d
journalctl --vacuum-size=1G

# Remove old kernels (RHEL/CentOS)
yum install -y yum-utils
package-cleanup --oldkernels --count=2

# Clean apt cache (Ubuntu)
apt clean
apt autoclean
apt autoremove --purge

# Clean Docker (if used)
docker system prune -af --volumes

Find large files:

# Files >1 GB
find /var -type f -size +1G -exec ls -lh {} \; | sort -k5 -hr
# Files >100 MB not accessed in 7 days
find /var/log -type f -size +100M -atime +7

Set XFS quota for a user (example appuser limited to 50 GB):

# Enable quota in /etc/fstab
/dev/mapper/vg0-var /var xfs defaults,uquota,gquota 0 0
mount -o remount /var
# Apply quota
xfs_quota -x -c 'limit bsoft=45G bhard=50G appuser' /var
xfs_quota -x -c 'report -h' /var

Monitoring & Alerts

Prometheus Metrics

# Download and start node_exporter (v1.8.2 example)
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xf node_exporter-*.tar.gz
cd node_exporter-*/
./node_exporter &

Key PromQL alerts (thresholds shown):

# Disk usage >85%
(1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 85
# Inode usage >90%
(1 - node_filesystem_files_free{mountpoint=~"/|/var"} / node_filesystem_files{mountpoint=~"/|/var"}) * 100 > 90
# I/O utilization >80%
rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
# 99th‑percentile read latency >100 ms
histogram_quantile(0.99, rate(node_disk_read_time_seconds_total[5m])) > 0.1

Native Monitoring Commands

# I/O stats every 2 s
iostat -xm 2
# Show only processes doing I/O (requires root)
iotop -o
# Check I/O wait in top (wa% >20% indicates bottleneck)
top

Suggested alert thresholds:

Disk usage >85 % → start cleanup.

Inode usage >90 % → delete small files.

I/O wait >20 % → investigate scheduler & application.

Average queue depth >10 → I/O saturation.

Performance & Capacity

Parameter Tuning Summary

I/O scheduler – SSD: none or mq-deadline; HDD: mq-deadline.

vm.swappiness – SSD: 10, HDD: 60 (reduce swap for DB workloads).

vm.dirty_ratio – SSD: 15, HDD: 10 (higher dirty pages on SSD).

Readahead – SSD: 256‑512 KB; HDD: 1024‑2048 KB.

Mount options – SSD: noatime,nodiratime; HDD: defaults.

Set readahead (example for /dev/sda):

# Show current value
blockdev --getra /dev/sda
# Set to 512 sectors (256 KB)
blockdev --setra 512 /dev/sda
# Persist via /etc/rc.local
echo 'blockdev --setra 512 /dev/sda' >> /etc/rc.local
chmod +x /etc/rc.local

Capacity Planning

OS disk – keep 15 % free.

Database disk – keep 20 % free for temp sorting & backups.

Log disk – rotate logs, retain 30 days.

Container storage – auto‑clean unused images & volumes.

Expansion triggers:

Disk usage reaches 80 % → start expansion request.

Projected 90 % within 30 days → urgent expansion.

IOPS sustained >80 % utilization → upgrade disk tier.

Security & Compliance

Permission hardening:

# Restrict MySQL data directory
chmod 700 /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
# Restrict log directory
chmod 750 /var/log
chown root:adm /var/log

Auditd monitoring (example):

# Watch critical paths
auditctl -w /var/lib/mysql -p wa -k mysql_data_change
auditctl -w /etc/fstab -p wa -k fstab_change
# Query audit logs
ausearch -k mysql_data_change

Data‑at‑rest encryption with LUKS (new disk example):

# Create encrypted LUKS container
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_disk
mkfs.xfs /dev/mapper/encrypted_disk

Backup strategy:

Create LVM snapshots before any expansion.

Full weekly backups retained 4 weeks.

Daily incremental backups retained 7 days.

Common Failures & Troubleshooting

Disk full (df 100 %) – Diagnose with du -sh /* | sort -hr; clean logs via journalctl --vacuum-size=1G; configure log rotation to prevent recurrence.

Inode exhaustion – Check with df -i and locate directories containing many small files; delete caches or adjust filesystem layout.

High I/O wait – Use iostat -x 1 and iotop -o to identify offending processes; rate‑limit or pause heavy jobs, then tune the scheduler.

XFS expansion failure – Run xfs_info /mount to verify; if corruption suspected, run xfs_repair -n /dev/vg0/var (dry‑run) then repair on unmounted filesystem.

LVM snapshot full – Check with lvs -a; extend snapshot size via lvextend -L +5G /dev/vg0/snap or allocate a larger snapshot initially.

Filesystem mounted read‑only – Inspect kernel messages with dmesg | grep -i error; remount read‑write with mount -o remount,rw / and address underlying disk errors.

Urgent full‑disk handling example:

# 1. Locate large directories
du -sh /* | sort -hr | head -5
# 2. Clean old logs
journalctl --vacuum-time=1d
find /var/log -name "*.log" -mtime +7 -delete
# 3. Prune Docker (if present)
docker system prune -af
# 4. Temporary LV extension if free space exists
lvextend -L +10G /dev/vg0/var && xfs_growfs /var
# 5. Verify
df -h /var

Change & Rollback Playbooks

Pre‑Change Checklist

# 1. Backup critical data
tar czf /backup/var-$(date +%F).tar.gz /var/important-data

# 2. Create LVM snapshot (if applicable)
lvcreate -L 10G -s -n var-snapshot-$(date +%F) /dev/vg0/var

# 3. Record current state
df -h > /root/df-before.txt
lsblk > /root/lsblk-before.txt
pvs && vgs && lvs > /root/lvm-before.txt

# 4. Check disk health (SMART)
smartctl -H /dev/sda
smartctl -A /dev/sda | grep -i "reallocated\|pending\|uncorrectable"

Expansion Execution Script (bash, idempotent)

#!/bin/bash
set -euo pipefail

NEW_DISK="/dev/sdb"
VG_NAME="vg0"
LV_NAME="var"
EXTEND_SIZE="+50G"

# Create PV if missing
if ! pvdisplay "$NEW_DISK" &>/dev/null; then
  echo "Creating PV $NEW_DISK"
  pvcreate "$NEW_DISK"
else
  echo "PV $NEW_DISK already exists"
fi

# Extend VG if disk not present
if ! vgdisplay "$VG_NAME" | grep -q "$NEW_DISK"; then
  echo "Extending VG $VG_NAME with $NEW_DISK"
  vgextend "$VG_NAME" "$NEW_DISK"
else
  echo "VG $VG_NAME already contains $NEW_DISK"
fi

# Extend LV
echo "Extending LV /dev/$VG_NAME/$LV_NAME by $EXTEND_SIZE"
lvextend -L $EXTEND_SIZE "/dev/$VG_NAME/$LV_NAME"

# Grow filesystem based on type
MOUNT_POINT=$(findmnt -n -o TARGET --source "/dev/$VG_NAME/$LV_NAME")
FS_TYPE=$(findmnt -n -o FSTYPE --source "/dev/$VG_NAME/$LV_NAME")
if [[ "$FS_TYPE" == "xfs" ]]; then
  echo "Growing XFS on $MOUNT_POINT"
  xfs_growfs "$MOUNT_POINT"
elif [[ "$FS_TYPE" == "ext4" ]]; then
  echo "Growing ext4 on /dev/$VG_NAME/$LV_NAME"
  resize2fs "/dev/$VG_NAME/$LV_NAME"
fi

# Verify
df -h "$MOUNT_POINT"
echo "Expansion completed"

Rollback Scenarios

Filesystem expansion failure – Unmount, merge snapshot, remount:

umount /var
lvconvert --merge /dev/vg0/var-snapshot
mount /var

Accidental PV removal – Restore LVM metadata from backup:

vgcfgrestore -l vg0   # list backups
vgcfgrestore -f /etc/lvm/archive/vg0_XXXXX.vg vg0
vgchange -ay vg0

Disk failure – Migrate data off the failed PV and remove it:

pvmove /dev/sdb   # move data to other PVs
vgreduce vg0 /dev/sdb
pvremove /dev/sdb

Best Practices

Use LVM as the default storage layout for new servers – simplifies future expansion and snapshotting.

Plan separate partitions (e.g., /var, /var/log, /home) to avoid a single point of exhaustion.

Monitor disk metrics before and after any change; ensure Prometheus alerts return to normal.

Always create an LVM snapshot before modifications; size the snapshot at least twice the expected write volume during the operation.

Match I/O scheduler to media: SSD → none, HDD → mq-deadline.

Automate log cleanup via cron (e.g., journalctl --vacuum-time=30d).

Set capacity thresholds: 80 % warning, 85 % alert, 90 % urgent.

Prefer XFS for databases (large files, high throughput); ext4 is acceptable for general workloads.

Avoid online LV shrinking; migrate data to a new LV instead.

Test snapshot restore and backup recovery at least quarterly.

Appendix

A. Idempotent LVM Expansion Script

#!/bin/bash
# Usage: ./lvm_extend.sh /dev/sdb vg0 var +50G
set -euo pipefail
NEW_DISK=$1
VG_NAME=$2
LV_NAME=$3
EXTEND_SIZE=$4

# Ensure PV exists
if pvdisplay "$NEW_DISK" &>/dev/null; then
  echo "$NEW_DISK already a PV"
else
  pvcreate "$NEW_DISK"
fi

# Ensure VG contains the PV
if vgdisplay "$VG_NAME" | grep -q "$NEW_DISK"; then
  echo "$NEW_DISK already in VG $VG_NAME"
else
  vgextend "$VG_NAME" "$NEW_DISK"
fi

# Extend LV
lvextend -L $EXTEND_SIZE "/dev/$VG_NAME/$LV_NAME"

# Detect mount point and FS type
MOUNT_POINT=$(findmnt -n -o TARGET --source "/dev/$VG_NAME/$LV_NAME")
FS_TYPE=$(findmnt -n -o FSTYPE --source "/dev/$VG_NAME/$LV_NAME")

if [[ "$FS_TYPE" == "xfs" ]]; then
  xfs_growfs "$MOUNT_POINT"
elif [[ "$FS_TYPE" == "ext4" ]]; then
  resize2fs "/dev/$VG_NAME/$LV_NAME"
fi

df -h "$MOUNT_POINT"
echo "LVM expansion completed"

B. fio Test Configuration (fio-test.ini)

[global]
ioengine=libaio
direct=1
iodepth=32
time_based
runtime=60
group_reporting
directory=/var/fio-test

[seqwrite]
rw=write
bs=4M
numjobs=1
stonewall

[seqread]
rw=read
bs=4M
numjobs=1
stonewall

[randwrite]
rw=randwrite
bs=4K
numjobs=4
stonewall

[randread]
rw=randread
bs=4K
numjobs=4
stonewall

C. Prometheus Alert Rules (prometheus-disk-alerts.yml)

groups:
- name: disk_alerts
  interval: 30s
  rules:
  - alert: DiskSpaceHigh
    expr: (1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Disk usage > 85% (instance: {{ $labels.instance }})"
      description: "{{ $labels.mountpoint }} usage is {{ $value }}%"
  - alert: DiskSpaceCritical
    expr: (1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Disk usage > 90% (instance: {{ $labels.instance }})"
  - alert: InodeUsageHigh
    expr: (1 - node_filesystem_files_free / node_filesystem_files) * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Inode usage > 90% (instance: {{ $labels.instance }})"
  - alert: DiskIOHigh
    expr: rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Disk I/O utilization > 80% (instance: {{ $labels.instance }})"

D. Disk Health Check Script

#!/bin/bash
# SMART health check for all SATA and NVMe disks
for disk in /dev/sd? /dev/nvme?n?; do
  [[ -e $disk ]] || continue
  echo "=== Checking $disk ==="
  # Overall health
  smartctl -H $disk | grep -i "SMART overall-health"
  # Key attributes
  smartctl -A $disk | grep -E "Reallocated_Sector|Current_Pending_Sector|Offline_Uncorrectable"
  if [[ $disk == *nvme* ]]; then
    smartctl -A $disk | grep -E "Media_Errors|Percentage_Used"
  fi
  echo ""
done
MonitoringPerformance TuningLinuxLVMdisk-managementI/O PerformanceShell Scripts
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.