How to Monitor Server I/O Performance Using top, iostat, and iotop
This article explains the 2019 Alibaba Cloud IO HANG incident, defines IO HANG, and provides step‑by‑step guidance on using the Linux commands top, iostat, and iotop (including examples and key options) to monitor and troubleshoot server disk I/O performance.
On March 3, 2019, Alibaba Cloud suffered an IO HANG incident in the North China region, causing many ECS instances to become unresponsive and leading to widespread service outages; the provider later confirmed that slow disk read/write operations caused the problem.
IO HANG refers to extremely slow disk I/O that makes threads and processes hang, which can bring down servers, especially database services such as RDS and HybridDB where I/O speed directly impacts SQL execution.
To monitor server I/O, three common Linux commands are introduced: top, iostat, and iotop.
top command provides real‑time monitoring of CPU, memory, and process information. Running top displays columns such as PID, USER, PR, NI, VIRT, RES, SHR, S, CPU, MEM, TIME+, and COMMAND. The most important columns are explained in the table below:
PID 进程id
USER 进程所有者用户名
PR 优先级
NI nice值
VIRT 进程使用的虚拟内存总量
RES 进程使用的未被换出的物理内存大小
SHR 共享内存大小
S 进程状态 (S=睡眠, T=跟踪, R=运行, Z=僵尸, D=不可中断的睡眠)
CPU CPU时间统计
MEM 物理内存占比
TIME+ 进程使用的CPU时间总计(单位 1/100 秒)
COMMAND 命令行命令名Common interactive keys include d (refresh interval), p (monitor specific PID), q (quit), S (cumulative mode), s (safe mode), i (hide idle processes), and c (show full command line).
iostat command monitors device‑level I/O load. A typical usage is $ iostat -d -k 2, where -d shows device statistics, -k forces kilobyte units, and 2 sets a 2‑second refresh interval. The output includes metrics such as tps, kB_read/s, kB_wrtn/s, and %util, which indicates how busy a disk is (100 % means fully utilized). If the command is missing, install it with yum install sysstat.
iotop command is the I/O‑focused counterpart of top. Running iotop shows per‑process I/O usage, indicating which processes are reading or writing and the amount of data transferred. An alternative tool is pidstat -d, which also reports per‑process I/O statistics.
In production environments, real‑time monitoring of server I/O is crucial, especially for database servers, because degraded I/O can slow down reads/writes, cause SQL latency, and ultimately lead to process hangs, database congestion, and server crashes.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
