Operations 9 min read

Mastering Linux Disk Performance: I/O Schedulers, Metrics, and Tuning

This article explains Linux disk hardware basics, the block device workflow, I/O scheduler options, how to view and change the scheduler, and key performance metrics such as utilization, saturation, IOPS, throughput, and response time using tools like iostat.

Tech Stroll Journey
Tech Stroll Journey
Tech Stroll Journey
Mastering Linux Disk Performance: I/O Schedulers, Metrics, and Tuning

In the previous article we covered file systems and disks, focusing on directory entries, inodes, logical blocks, and the virtual file system that bridges various file systems. This piece dives deeper into how disks work with file systems and which metrics reveal disk performance.

1. Understanding Disks

Disks are broadly classified into mechanical drives and solid‑state drives (SSDs). Mechanical drives store data in 512‑byte sectors and require head movement for non‑sequential access, while SSDs use 4 KB or 8 KB pages that can serve as logical blocks directly or be aggregated into larger units. Disks also differ by interface (IDE, SATA, SCSI, SAS), which is reflected in device names like hda or sda.

2. Disk Workflow

The data path from an application to the physical disk passes through several layers:

Application → System Interface → Virtual File System → Specific File System → Block Device Layer → Device Driver → Disk Hardware . The block device layer acts like an I/O scheduler, unifying various disk drivers and presenting a consistent interface to file systems. It also merges and reorders I/O requests to turn many small random operations into larger sequential ones.

3. I/O Scheduler Algorithms

none : No scheduler; used mainly for direct‑attached SSDs to bypass file‑system I/O. noop : Simple FIFO queue with minimal request merging. CFQ : Maintains a separate queue per process, aiming for fair I/O distribution. deadline : Separate read and write queues, improving latency and ensuring deadline‑sensitive requests are prioritized (newer kernels may call it mq-deadline ).

Schedulers can be combined or switched via configuration files. Example output on a high‑performance node:

# cat /sys/block/nvme0n1/queue/scheduler
[none]
# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

Here sda is a regular SSD and nvme0n1 is a high‑speed NVMe device that bypasses the file system.

To change the scheduler temporarily:

# echo deadline > /sys/block/sda/queue/scheduler

For a permanent change, edit /etc/default/grub and add elevator=deadline, then run update-grub.

4. Performance Metrics

Disk performance varies with workload, so monitoring several metrics is essential.

Utilization (%util) : Percentage of time the disk is busy. View with iostat -d -x 1.

Saturation (avgqu‑sz) : Average queue length; lower values indicate less contention.

IOPS (r/s, w/s) : Number of read/write operations per second; important for workloads with many small files.

Throughput (rkB/s, wkB/s) : Data transferred per second, expressed in MB/s or GB/s.

Response Time (await) : Total time from request issuance to completion, including queue wait.

Additional tools such as pidstat and iotop can provide per‑process I/O insights.

These metrics are interrelated; for example, high utilization often leads to increased queue length and longer await times.

Disk workflow diagram
Disk workflow diagram
Performance TuningLinuxI/O schedulerDisk Performanceiostat
Tech Stroll Journey
Written by

Tech Stroll Journey

The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.