Why Is Disk Latency Rising? Deep Dive into Linux I/O Stack & Metrics
This article explains the Linux storage I/O stack, breaks down disk latency into transmission, queue, and processing time, shows how to measure each part with tools like iostat and disktool, and provides a step‑by‑step checklist for diagnosing high disk latency.
System I/O Stack
When an application writes data to disk, the request first passes through the virtual file system (VFS), then the block device layer, where it is sorted and merged before finally reaching the physical disk. This stack is typically the slowest part of the storage path, so Linux employs several caching layers—page cache, inode cache, directory entry cache—for the file system and a separate buffer cache for block devices to improve performance.
Why Disk Latency Increases
Observed latency (e.g., system stalls) corresponds to the I/O response time, which can be expressed as:
Response Time = Transmission Time (round‑trip) + Request Queue Time + Disk Processing Time
Each component can be examined individually to pinpoint the root cause.
Transmission Time
This part is dominated by network factors such as media, bandwidth, NIC configuration, traffic shaping, and protocol settings. Detailed discussion is deferred to a future networking article.
Request Queue Time
When the load is moderate and hardware is healthy, the queue length should be low. High concurrency I/O (e.g., massive database batch jobs or big‑data tasks) can exceed the storage IOPS or throughput limits, causing queues to build up. Use iostat -d -x 1 and monitor the avgqu‑sz column to see the average queue size.
root@node:~# iostat -d -x 1
Linux 4.15.0-58-generic (cs1anr01n02) 11/14/2025 _x86_64_ (48 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
sda 0.00 1.05 0.00 1.04 0.00 9.60 18.40 0.00 3.95 0.29 3.95 1.58 0.16
sdb 0.00 0.04 0.26 203.44 10.18 1663.53 16.43 0.03 0.15 1.81 0.14 0.06 1.17
sdc 0.00 0.04 0.33 154.60 14.10 1250.00 16.32 0.02 0.15 2.00 0.15 0.07 1.08
nvme0n1 20.88 70.25 367.48 691.60 3127.79 6237.64 17.69 0.02 0.01 0.05 0.03 0.01 0.77
...High avgqu‑sz indicates that the request queue is a bottleneck.
Disk Processing Time
Factors that affect the disk’s own processing include firmware faults (bad sectors, read/write errors), wear‑out over time, temperature, cable quality, and enclosure conditions. Use disktool or smartctl to quickly inspect SMART data and temperature.
Example command: # disktool show -l all Typical output (truncated for brevity):
-- Controller Information --
| controller_id | locate | enclosures | virtual_drives | disks | pci_path |
| 0 | c0 | 1 | 1 | 2 | 0000:5e:00.0 |
-- Enclosure Information --
| enclosure_id | locate | status | number_of_physical_drives | number_of_slots |
| 8 | c1e8 | OK | 12 | 16 |
...
-- Nvme device Information --
| name | path | sn | percent_used | temperature | critical_warning | media_errors | num_err_log_entries |
| nvme0n1 | /dev/nvme0n1 | BTYF01830HFC480CGN | 5 | 43 | 0 | 0 | 0 |If SMART reports errors or high temperature, replace the drive or adjust cooling.
Disk Scheduler Configuration
Linux offers several I/O schedulers that influence how requests are ordered:
none: No scheduler; typically used for direct‑attached SSDs to bypass the file‑system queue. noop: Simple FIFO queue with minimal request merging. CFQ: Completely Fair Queuing; maintains a separate queue per process for fair bandwidth distribution. deadline (or mq‑deadline): Separate read and write queues; prioritises requests that approach their deadline, improving latency for mixed workloads.
Other Influencing Factors
Write‑cache policies (write‑through vs. write‑back) and RAID configuration also affect latency. Different RAID levels have distinct read/write characteristics.
Troubleshooting Checklist
High‑concurrency I/O bursts (e.g., database batch jobs, big‑data tasks). Use fio to benchmark the I/O ceiling.
Frequent small I/O operations (log writes, random DB queries). Examine iostat metrics (r/s, w/s, rkB/s, wkB/s) and estimate average request size by dividing data volume by request count.
Inefficient application I/O patterns (excessive sync calls, frequent open/close). Diagnose with strace.
System resource contention (CPU, memory pressure, other processes consuming I/O). Monitor resource utilisation with tools like top or htop.
By following this structured approach—examining the I/O stack, measuring each latency component, checking hardware health, reviewing scheduler settings, and evaluating application behaviour—you can systematically identify and resolve the causes of high disk latency.
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
