Understanding iostat Metrics: From /proc/diskstats to Real‑World Disk Performance
This article explains the meaning of iostat fields, clarifies common misconceptions about svctm and await, details each /proc/diskstats column, shows how to calculate key performance metrics, and illustrates practical analysis with real disk examples.
iostat(1) is the basic Linux tool for viewing I/O performance, but many users misinterpret its fields, especially svctm, which is deprecated and unreliable. Unlike HP‑UX's avserv, Linux’s svctm should not be trusted, as warned in the iostat and sar man pages.
On Linux, the average I/O latency is reported as await, which mixes actual device service time with time spent waiting in the kernel queue, so it does not directly reflect disk speed. The kernel’s /proc/diskstats provides the raw counters needed to interpret iostat output.
/proc/diskstats fields
(rd_ios) – number of read operations.
(rd_merges) – number of merged reads (adjacent reads combined by the I/O scheduler).
(rd_sectors) – sectors read.
(rd_ticks) – time spent on reads (ms), including queue wait.
(wr_ios) – number of write operations.
(wr_merges) – number of merged writes.
(wr_sectors) – sectors written.
(wr_ticks) – time spent on writes (ms), including queue wait.
(in_flight) – current number of I/O requests in flight (incremented on queue entry, decremented on completion).
(io_ticks) – wall‑clock time the device was busy processing I/O.
(time_in_queue) – weighted value of io_ticks multiplied by the number of in‑flight I/Os.
Because /proc/diskstats does not separate queue wait from device service time, any tool built on it cannot provide a pure disk service time metric.
Derived iostat metrics
tps = (Δrd_ios + Δwr_ios) / Δt – I/O operations per second.
r/s = Δrd_ios / Δt – reads per second.
w/s = Δwr_ios / Δt – writes per second.
rkB/s = Δrd_sectors / Δt * 512 / 1024 – kilobytes read per second.
wkB/s = Δwr_sectors / Δt * 512 / 1024 – kilobytes written per second.
rrqm/s = Δrd_merges / Δt – merged reads per second.
wrqm/s = Δwr_merges / Δt – merged writes per second.
avgrq‑sz = (Δrd_sectors + Δwr_sectors) / (Δrd_ios + Δwr_ios) – average request size.
avgqu‑sz = Δtime_in_queue / Δt – average number of I/Os in flight.
await = (Δrd_ticks + Δwr_ticks) / (Δrd_ios + Δwr_ios) – average time per I/O (includes queue wait).
r_await = Δrd_ticks / Δrd_ios – average read latency.
w_await = Δwr_ticks / Δwr_ios – average write latency.
%util = Δio_ticks / Δt – proportion of time the device was busy (does not indicate saturation).
svctm – deprecated metric, essentially util/tput and not useful.
Example analysis shows that disks with non‑zero rrqm/s benefit from I/O merging, often due to a different I/O scheduler. In a test, disk sdb had higher rrqm/s, higher r/s and rkB/s, and used a distinct scheduler, explaining its superior performance.
The %util field reflects the fraction of time the device was handling any I/O, ignoring how many concurrent requests were processed. Modern disks can handle many parallel I/Os, so %util can reach 100 % without the device being saturated. No single iostat metric directly measures saturation.
Await values depend on storage type: SSDs typically show sub‑millisecond to a few milliseconds; a 10 k RPM HDD averages around 8.38 ms (seek + rotational latency + transfer). However, actual acceptable await depends on workload characteristics—random, high‑load workloads increase queue wait, while sequential, single‑process loads keep await low.
For RAID arrays with cache, write service times appear faster because writes complete once cached; reads still depend on physical disk speed.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
