Operations 27 min read

Master Linux Disk Management & I/O Optimization: Expand and Tune

This comprehensive guide walks you through Linux disk expansion, LVM provisioning, filesystem resizing, I/O scheduler tuning, performance benchmarking with fio, monitoring with Prometheus, security hardening, troubleshooting common disk issues, and best‑practice recommendations for reliable production environments.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Linux Disk Management & I/O Optimization: Expand and Tune

Applicable Scenarios & Prerequisites

Applicable scenarios : Production environments with insufficient disk space, I/O bottlenecks, database/storage servers, containerized platforms.

Prerequisites :

OS: RHEL/CentOS 7‑9, Ubuntu 18.04‑24.04

Permissions: root or sudo

Tools: parted, lvm2, xfsprogs / e2fsprogs, fio, iostat (sysstat package)

Backup: complete data backup or snapshot before any operation

Maintenance window: perform expansion/migration during low‑traffic periods

Quick Checklist

Check disk partitions and usage

Identify disk type (HDD/SSD/NVMe) and I/O scheduler

Create LVM physical volume, volume group, logical volume

Online expand filesystem (XFS/ext4)

Configure I/O scheduler and read/write policies

Run fio benchmarks to verify IOPS/throughput

Set up Prometheus disk monitoring and alerts

Configure disk quotas and permission controls

Implement disk cleanup and archiving strategies

Prepare snapshot and rollback plan

Implementation Steps

Step 1 – Diagnose Disk Space and Partition Layout

View partitions and mount points :

# View disk partitions (HDD sd*, NVMe nvme*)
lsblk -f
fdisk -l | grep "Disk /dev"

# View filesystem usage (including inodes)
df -hT
df -i

# Find large directories
du -sh /* | sort -hr | head -10
du -h --max-depth=2 /var | sort -hr | head -20

Key parameters : FSTYPE: confirm filesystem type (use xfs_growfs for XFS, resize2fs for ext4) SIZE vs MOUNTPOINT: detect unallocated space or unmounted partitions df -i: inode usage >85% requires cleanup

Idempotency : read‑only commands can be re‑executed safely.

Step 2 – Identify Disk Type and I/O Scheduler

Detect disk type :

# Check if SSD/NVMe (rotational=0 means SSD)
lsblk -d -o NAME,ROTA,DISC-GRAN
cat /sys/block/sda/queue/rotational   # 0=SSD, 1=HDD

# View current I/O scheduler
cat /sys/block/sda/queue/scheduler   # e.g., [mq-deadline] none

Recommended scheduler :

HDD (ROTA=1): deadline or cfq SSD/NVMe (ROTA=0): none (noop) or mq-deadline Set scheduler temporarily : echo none > /sys/block/nvme0n1/queue/scheduler Persistently set with udev rule (RHEL/CentOS) :

cat > /etc/udev/rules.d/60-ioscheduler.rules <<'EOF'
# SSD/NVMe
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
# HDD
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="mq-deadline"
EOF
udevadm control --reload-rules && udevadm trigger

Verify with cat /sys/block/nvme0n1/queue/scheduler (should show [none]).

Step 3 – LVM Disk Expansion (Online, No Downtime)

Scenario : Add a new 100 GB disk /dev/sdb and extend /var partition.

Create Physical Volume (PV) :

# List existing PVs
pvdisplay
# Add new disk to LVM
pvcreate /dev/sdb
pvdisplay /dev/sdb   # verify size

Extend Volume Group (VG) :

# Show current VGs
vgdisplay
# Extend vg0 with the new PV
vgextend vg0 /dev/sdb
vgdisplay vg0   # confirm free PE increased

Extend Logical Volume (LV) :

# Show LV path
lvdisplay /dev/vg0/var
# Extend by 50 GB (or use -l +100%FREE for all free space)
lvextend -L +50G /dev/vg0/var
# Or: lvextend -l +100%FREE /dev/vg0/var

Resize Filesystem :

# XFS (online)
xfs_growfs /var
# ext4
resize2fs /dev/vg0/var

Verification (before and after):

# Before
df -h /var   # 49G 45G 4G 92%
# After
df -h /var   # 99G 45G 54G 46%

Rollback note : LVM expansion is one‑way; create a snapshot before expanding:

# Create snapshot (10 GB)
lvcreate -L 10G -s -n var-snapshot /dev/vg0/var
# Roll back if needed
lvconvert --merge /dev/vg0/var-snapshot

Step 4 – Partition Expansion (Non‑LVM)

Scenario : Cloud VM system disk /dev/sda3 needs to be enlarged.

Online expand with growpart :

# Install tools
yum install -y cloud-utils-growpart   # RHEL/CentOS
apt install -y cloud-guest-utils      # Ubuntu
# Expand partition 3 without data loss
growpart /dev/sda 3
partprobe /dev/sda   # reread partition table
# Expand filesystem
xfs_growfs /   # XFS
resize2fs /dev/sda3   # ext4

Manual expand with parted (dangerous, backup first):

# Start parted
parted /dev/sda
(parted) print free   # view free space
(parted) resizepart 3 100%   # extend to end of disk
(parted) quit
partprobe /dev/sda
# Expand filesystem as above

Step 5 – I/O Performance Benchmarking & Tuning

Install fio :

# RHEL/CentOS
yum install -y fio
# Ubuntu
apt install -y fio

Sequential write test (4 MiB block, 10 GiB file) :

fio --name=seqwrite --rw=write --bs=4M --size=10G \
    --numjobs=1 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=16 \
    --direct=1 --group_reporting

Sequential read test :

fio --name=seqread --rw=read --bs=4M --size=10G \
    --numjobs=1 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=16 \
    --direct=1 --group_reporting

Random read/write (database workload) :

# Random write (4 KiB)
fio --name=randwrite --rw=randwrite --bs=4K --size=10G \
    --numjobs=4 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=32 \
    --direct=1 --group_reporting
# Random read (4 KiB)
fio --name=randread --rw=randread --bs=4K --size=10G \
    --numjobs=4 --runtime=60 --time_based \
    --directory=/var/fio-test --ioengine=libaio --iodepth=32 \
    --direct=1 --group_reporting

Target metrics :

HDD: sequential 100‑200 MB/s, random IOPS 100‑300

SATA SSD: sequential 500‑550 MB/s, random IOPS 50K‑90K

NVMe SSD: sequential 2‑7 GB/s, random IOPS 200K‑1M

After testing, clean up:

rm -rf /var/fio-test

Step 6 – Kernel Parameter Tuning

Adjust virtual memory and I/O settings (append to /etc/sysctl.conf and apply):

# Reduce swap usage (databases)
vm.swappiness = 10
# Dirty page ratios (SSD can be higher)
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
# Faster writeback
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
# Increase file handles for high concurrency
fs.file-max = 2097152

sysctl -p

Optimize XFS mount options :

# Edit /etc/fstab, add noatime and nodiratime
/dev/mapper/vg0-var /var xfs defaults,noatime,nodiratime 0 0
mount -o remount /var   # apply without reboot

Step 7 – Disk Cleanup & Capacity Management

Log and temporary file cleanup :

# Reduce systemd journal (keep last 7 days)
journalctl --vacuum-time=7d
journalctl --vacuum-size=1G
# Remove old kernels (RHEL/CentOS)
yum install -y yum-utils
package-cleanup --oldkernels --count=2
# Clean apt cache (Ubuntu)
apt clean && apt autoclean && apt autoremove --purge
# Docker cleanup (if used)
docker system prune -af --volumes

Find large files :

# Files >1 GB
find /var -type f -size +1G -exec ls -lh {} \; | sort -k5 -hr
# Large files not accessed in 7 days
find /var/log -type f -size +100M -atime +7

Set disk quotas (XFS example) :

# Enable user quotas in /etc/fstab
/dev/mapper/vg0-var /var xfs defaults,uquota,gquota 0 0
mount -o remount /var
# Limit user "appuser" to 50 GB
xfs_quota -x -c 'limit bsoft=45G bhard=50G appuser' /var
xfs_quota -x -c 'report -h' /var

Monitoring & Alerts

Prometheus Metrics

Install node_exporter :

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xf node_exporter-*.tar.gz
cd node_exporter-*/
./node_exporter &

Key PromQL queries :

# Disk usage >85%
(1 - node_filesystem_avail_bytes{mountpoint=~"/|/var"} / node_filesystem_size_bytes{mountpoint=~"/|/var"}) * 100 > 85
# Inode usage >90%
(1 - node_filesystem_files_free{mountpoint=~"/|/var"} / node_filesystem_files{mountpoint=~"/|/var"}) * 100 > 90
# Disk I/O utilization >80%
rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
# 99th‑percentile read latency >100 ms
histogram_quantile(0.99, rate(node_disk_read_time_seconds_total[5m])) > 0.1

Grafana panels : Dashboard ID 1860 (Node Exporter Full) – focus on Disk Space Used, Disk I/O, Disk Latency, IOPS.

Native Monitoring Commands

Real‑time I/O :

# iostat every 2 s
iostat -xm 2
# iotop (requires root) – show only processes doing I/O
iotop -o
# Check I/O wait in top (wa% >20% indicates bottleneck)
top

Threshold recommendations :

Disk usage >85% → start cleanup

Inode usage >90% → delete small files

I/O wait >20% → examine scheduler and application

Average queue depth >10 → I/O saturation

Performance & Capacity

Key tuning parameters (recommended values):

I/O scheduler: SSD → none / mq-deadline; HDD → mq-deadline vm.swappiness: SSD → 10, HDD → 60

vm.dirty_ratio: SSD → 15, HDD → 10

readahead: SSD → 256‑512 KB, HDD → 1024‑2048 KB

Mount options: add noatime,nodiratime for XFS/SSD

Set readahead :

# Show current value
blockdev --getra /dev/sda
# Set to 512 sectors (256 KB)
blockdev --setra 512 /dev/sda
# Persist via /etc/rc.local if needed

Capacity planning :

OS disk: reserve 15% free

Database disk: reserve 20% for temp files and backups

Log disk: rotate logs, keep 30 days

Container storage: auto‑clean unused images/volumes

Expansion triggers :

Disk usage reaches 80% → start expansion request

Projected 90% usage within 30 days → urgent expansion

IOPS sustained >80% utilization → upgrade disk tier

Security & Compliance

Permission control (example for MySQL data):

chmod 700 /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
chmod 750 /var/log
chown root:adm /var/log

Audit logging with auditd:

auditctl -w /var/lib/mysql -p wa -k mysql_data_change
auditctl -w /etc/fstab -p wa -k fstab_change
ausearch -k mysql_data_change

Data encryption (LUKS example):

cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_disk
mkfs.xfs /dev/mapper/encrypted_disk

Backup strategy :

LVM snapshot before any expansion

Full backup weekly (retain 4 weeks)

Incremental backup daily (retain 7 days)

Common Faults & Troubleshooting

Symptom

Diagnostic Command

Possible Root Cause

Quick Fix

Permanent Fix

Disk full (df 100%) du -sh /* | sort -hr Log explosion or large files journalctl --vacuum-size=1G Configure log rotation, cleanup scripts

Inode exhaustion df -i Too many small files

Delete cache/temp files

Adjust filesystem layout or migrate data

High I/O wait iostat -xm 2 / iotop -o Slow queries or backup jobs

Throttle or pause heavy tasks

Optimize SQL and I/O scheduler

XFS resize failure xfs_info /mount Filesystem corruption xfs_repair -n /dev/vg0/var Unmount and run full xfs_repair LVM snapshot full lvs -a Insufficient COW space lvextend -L +5G /dev/vg0/snap Increase initial snapshot size

Filesystem read‑only dmesg | grep -i error Filesystem error or disk failure mount -o remount,rw / Repair filesystem or replace disk

Change & Rollback Playbooks

Pre‑change Checklist

# Backup critical data
tar czf /backup/var-$(date +%F).tar.gz /var/important-data
# Create LVM snapshot (if applicable)
lvcreate -L 10G -s -n var-snapshot-$(date +%F) /dev/vg0/var
# Record current state
df -h > /root/df-before.txt
lsblk > /root/lsblk-before.txt
pvs && vgs && lvs > /root/lvm-before.txt
# Check disk health (SMART)
smartctl -H /dev/sda
smartctl -A /dev/sda | grep -i "reallocated\|pending\|uncorrectable"

Expansion Execution Script

#!/bin/bash
set -e
NEW_DISK="/dev/sdb"
VG_NAME="vg0"
LV_NAME="var"
EXTEND_SIZE="+50G"
# Step 1: Create PV
pvcreate $NEW_DISK
# Step 2: Extend VG
vgextend $VG_NAME $NEW_DISK
# Step 3: Extend LV
lvextend -L $EXTEND_SIZE /dev/$VG_NAME/$LV_NAME
# Step 4: Grow filesystem
if mount | grep -q "type xfs"; then
    xfs_growfs /var
else
    resize2fs /dev/$VG_NAME/$LV_NAME
fi
# Step 5: Verify
df -h /var
echo "Expansion completed, current /var usage shown above"

Rollback Scenarios

Filesystem expansion failure – merge previously created snapshot:

umount /var
lvconvert --merge /dev/vg0/var-snapshot
mount /var

Accidental PV deletion – restore LVM metadata from backup:

vgcfgrestore -l vg0   # list backups
vgcfgrestore -f /etc/lvm/archive/vg0_XXXXX.vg vg0
vgchange -ay vg0

Disk failure – move data off the faulty PV and remove it:

pvmove /dev/sdb   # migrate data
vgreduce vg0 /dev/sdb
pvremove /dev/sdb

Best Practices

Use LVM as the default layout for easy expansion and snapshots.

Allocate dedicated partitions (e.g., /var, /var/log, /home) to avoid single‑point saturation.

Monitor disks before and after changes with Prometheus alerts.

Create snapshots before any expansion; size snapshots at least twice the expected write volume.

Match I/O scheduler to disk type: SSD → none, HDD → mq-deadline.

Automate log cleanup via cron (e.g., journalctl --vacuum-time=30d).

Set capacity warning thresholds: 80% warning, 85% alert, 90% urgent.

Prefer XFS for databases (large files, high performance); ext4 works for general use.

Avoid online LVM shrinkage; migrate data to a new LV instead.

Quarterly test snapshot restores and backup recoverability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningLinuxI/O optimizationLVMdisk-management
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.