Operations 49 min read

Linux Disk Partitioning, Mounting & Read/Write Issue Troubleshooting Guide

This article provides a comprehensive, step‑by‑step guide to Linux disk fundamentals, partitioning tools, mounting options, filesystem choices, LVM management, performance tuning, common error diagnostics, and five real‑world troubleshooting cases, enabling sysadmins to confidently manage and resolve disk‑related problems.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Linux Disk Partitioning, Mounting & Read/Write Issue Troubleshooting Guide

Overview

The article explains that disks are the most failure‑prone component in Linux servers and that newcomers often make two extreme mistakes: deleting data after seeing No space left on device or replacing a disk without checking partition tables or kernel dirty‑page settings.

Fundamentals

Linux abstracts a physical disk as a block device under /sys/block/.

Each disk can have one or more partitions defined by either MBR (max 4 primary partitions, 2 TiB limit) or GPT (virtually unlimited partitions, up to 18 EiB).

Partitioning Tools

# List block devices
lsblk
# Show partition table type
fdisk -l /dev/sda
# Create a new GPT partition
parted /dev/sdb mklabel gpt
parted /dev/sdb mkpart primary xfs 0% 100%
# fdisk interactive example
fdisk /dev/sdb

For GPT, gdisk is recommended; for both MBR and GPT, parted offers scriptable operations.

Filesystem Choices

ext4 : mature, supports up to 1 EiB, default on many distributions.

xfs : default on CentOS 7+, better for large files and high concurrency.

btrfs : B‑tree design, supports snapshots and compression, but less stable for production.

Typical mount options include defaults, noatime, nodiratime, ro/rw, noexec, nosuid, _netdev, and nofail.

LVM Basics

# Physical Volume (PV)
pvcreate /dev/sdb
# Volume Group (VG)
vgcreate data-vg /dev/sdb
# Logical Volume (LV)
lvcreate -L 200G -n data-lv data-vg
# Format and mount
mkfs.xfs /dev/data-vg/data-lv
mount /dev/data-vg/data-lv /data

LVM enables online expansion, multi‑disk aggregation, and snapshotting. Expansion workflow: add new PV, vgextend, lvextend, then resize the filesystem ( xfs_growfs or resize2fs).

Swap Management

# Create a swap file
dd if=/dev/zero of=/swapfile bs=1M count=8192
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# Persist in /etc/fstab
UUID=xxxx none swap sw 0 0

Adjust vm.swappiness (default 30) to control how aggressively the kernel uses swap.

IO Diagnostic Tools

iostat

– per‑device statistics ( %util, await, etc.). iotop – process‑level IO usage. vmstat – overall system counters (blocked processes, wa). dstat – combined view of CPU, disk, network, memory. pidstat -d – per‑process IO. smartctl and smartd – SMART health monitoring.

Common Problems and Solutions

1. Disk Full

Use df -h, df -i, and du -sh to locate large directories, then clean logs (e.g., > /var/log/messages) and configure logrotate.

2. High IO Utilization

Run iostat -xz 1 to identify the busy device, then iotop -ao to find the offending process (e.g., MySQL). Possible remedies include killing long‑running transactions, adjusting innodb_io_capacity, or adding SSD capacity.

3. Read‑Only Filesystem

Check mount output and dmesg for EXT4/XFS errors. If the kernel remounted the FS read‑only due to errors, run fsck.ext4 -y /dev/sda1 and then remount with mount -o remount,rw. Replace the disk if SMART reports many reallocated sectors.

4. Inodes Exhausted

Run df -i. If 100 % used, locate the directory with many small files (e.g., /data/cache) and delete or archive them, then set up a cron job to prune old files.

5. LVM Expansion Failure

Even when vgextend shows free space, the operation can fail due to insufficient Physical Extents (PE). Check PE size with vgdisplay. Solutions: shrink the LV (high risk) or add another PV and extend the VG.

Performance Baseline & Monitoring

# Sequential write test
dd if=/dev/zero of=/data/testfile bs=1M count=1024 oflag=direct
# Random read/write test with fio
fio --filename=/data/testfile --rw=randrw --bs=4k --size=1G --runtime=60 --name=randrw

Deploy node_exporter and Prometheus to collect node_filesystem_* and node_disk_* metrics. Example alert rules for disk usage, inode usage, high %util, and SMART reallocated sectors are provided.

Kernel Tuning

# Dirty page settings
sysctl vm.dirty_ratio=20
sysctl vm.dirty_background_ratio=10
# IO scheduler (e.g., deadline)
echo deadline > /sys/block/sda/queue/scheduler
# Read‑ahead size
echo 2048 > /sys/block/sda/queue/read_ahead_kb

Adjusting these parameters can improve write latency but may increase risk of data loss on power failure.

Capacity Planning & Lifecycle

Plan total capacity as business data × 1.5 × 1.5 (growth + backups).

Keep individual disks below 80 % usage.

Use hot, warm, and cold storage tiers (SSD for hot data, object storage for cold).

Operational Checklist & Runbook

Record disk model, size, RAID, partition table, filesystem, and UUID.

Verify mount -a before reboot.

Configure noatime, nofail, and appropriate LVM layout.

Ensure logrotate, monitoring, and alert thresholds are in place.

Perform expansion and backup drills regularly.

Advanced Tuning

Filesystem mount options: data=writeback for maximum speed (risk on power loss), barrier=0 for SSDs.

Defragmentation with e4defrag (ext4) or xfs_fsr (xfs).

Page‑cache management: sync; echo 1 > /proc/sys/vm/drop_caches (use with caution).

Special Scenarios

Large directories (>1 M files) – use ls -f or find instead of plain ls.

Soft vs. hard links – ln -s vs. ln.

Bulk file deletion – prefer find -delete or rsync --delete over rm -rf.

Backup & Recovery

# Full disk backup with dd
dd if=/dev/sda of=/backup/sda.img bs=4M status=progress
# Incremental backup with rsync and hard‑link copies
rsync -aAXv --delete --link-dest=/backup/prev /data/ /backup/$(date +%F)/
# LVM snapshot
lvcreate -L 10G -s -n data-snap /dev/data-vg/data-lv
mount -o ro /dev/data-vg/data-snap /mnt/snap

Follow the 3‑2‑1 rule (3 copies, 2 media, 1 off‑site) and test restores regularly.

Cross‑Host Storage

NFS – simple file sharing, not suitable for database workloads.

iSCSI – block‑level remote disks, useful for shared storage.

Ceph / GlusterFS – distributed file systems for large‑scale deployments.

FAQ

Buffer I/O error

– indicates bad sectors; check SMART and replace the disk. No space left on device but df shows free space – likely inode exhaustion or reserved blocks. Input/output error – use dd conv=noerror,sync to salvage data. ext4-fs warning: mounting fs with errors – run fsck.ext4 and remount.

Integration with Other Components

Databases (MySQL, PostgreSQL) should use dedicated SSD/NVMe disks with noatime and appropriate filesystem (xfs preferred).

Elasticsearch benefits from low vm.dirty_ratio and ample inodes.

Docker/K8s workloads should use local PVs for stateful sets and avoid placing logs on the same volume as application data.

Future Technologies

NVMe‑over‑Fabrics for low‑latency remote storage.

Persistent Memory (PMem) for near‑RAM performance.

ZNS SSDs for reduced write amplification.

CXL for next‑generation CPU‑to‑device interconnects.

Risk Management & Rollback

Always back up before destructive commands ( fdisk -w, mkfs, lvreduce). Use extundelete for accidental file deletions, and keep LVM snapshots for quick rollback.

Conclusion

The guide emphasizes a systematic approach: understand block devices, choose the right partition table, select a suitable filesystem, configure mount options, leverage LVM for flexibility, monitor with Prometheus, and apply kernel tunings. With these practices, sysadmins can prevent most disk‑related outages and resolve issues efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningLinuxtroubleshootingLVMfilesystemdisk managementIO monitoring
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.