Why Is CPU IO Wait So High? A Step‑by‑Step Linux IO Bottleneck Investigation
The article walks through a real‑world Linux alert showing a 243% CPU IO Wait, explains how to interpret the metric, gathers system basics, uses iostat, iotop, pidstat and mpstat to probe the IO subsystem, checks disk health, and ultimately discovers the root cause was a mis‑configured Prometheus alert rule.
This piece is part of a new "Performance Optimization in Practice" series that focuses on troubleshooting production‑level issues. An alert from the monitoring platform reported a CPU IO Wait of 243.23%, which is impossible for a single CPU core and indicates a severe IO wait problem.
First, the raw alert JSON is shown:
{
"message": "节点 node1 CPU IO Wait 当前是 243.2330043859647%",
"suggestion_i18n": {
"en": "检查CPU负载",
"zh": "检查CPU负载"
}
}The high IO Wait suggests that the system spends most of its time waiting for disk or network IO while the CPU itself is idle. The investigation therefore shifts to the IO subsystem.
Gather Basic System Information
# free -wh
total used free shared buffers cache available
Mem: 125G 28G 14G 34G 352M 82G 79G
Swap: 29G 29G 0B
# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 63G 0 100% /dev
tmpfs 13G 1.3G 12G 11% /run
/dev/sda1 46G 19G 26G 42% /
... (other mounts omitted)
# uptime
20:18:42 up 1620 days, 3:34, 2 users, load average: 0.66, 0.84, 1.32The load average is low, disk space is ample, and 14 GB of free memory remains, but the swap partition is completely used—a contradiction that hints at lingering swapped‑out pages after a previous memory pressure event.
Probe IO and CPU Usage
Using iostat to examine device statistics:
# iostat -x 2
Linux 4.15.0-58-generic (zw1bnr02n00) 11/10/2025 _x86_64_ (64 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.31 0.00 0.10 0.04 0.00 98.54
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
sda 0.00 9.80 0.49 35.70 29.74 994.55 56.60 0.01 0.29 2.62 0.26 0.07 0.27
avg-cpu: %user %nice %system %iowait %steal %idle
1.07 0.00 0.07 0.00 0.00 98.86
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
sda 0.00 3.00 0.00 21.50 0.00 98.00 9.12 0.00 0.00 0.00 0.00 0.00 0.00The %util values are low, indicating the disk is not saturated, and await is not significantly higher than svctm, so there is no massive IO queue.
Next, iotop -o is suggested to spot processes with active IO, followed by pidstat -d 1 to view per‑process read/write rates:
# pidstat -d 1
Linux 4.15.0-58-generic (zw1bnr02n00) 11/10/2025 _x86_64_ (64 CPU)
09:26:29 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
09:26:30 PM 0 854 0.00 19.23 0.00 0 jbd2/sda1-8
09:26:30 PM 110 5399 0.00 7.69 0.00 0 mysqld
09:26:30 PM 0 21441 0.00 19.23 0.00 0 prometheus
09:26:30 PM 0 40237 0.00 76.92 0.00 0 java
... (subsequent lines omitted)These outputs show that MySQL, Prometheus and a Java process are generating write traffic, but the overall IO load remains modest.
To verify CPU activity across all cores, mpstat -P ALL 1 5 is run:
# mpstat -P ALL 1 1
Linux 4.15.0-58-generic (zw1bnr02n00) 11/10/2025 _x86_64_ (64 CPU)
09:36:15 PM all %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:36:16 PM all 0.05 0.00 0.02 0.02 0.00 0.00 0.00 0.00 0.00 99.92
... (per‑CPU lines omitted)The CPU usage is negligible, confirming that the system is not CPU‑bound.
Check for Kernel IO Errors and Disk Health
# dmesg -T | grep -i error # Look for kernel I/O errors
# smartctl --all /dev/sda # Show SMART data for the diskNo error messages or SMART warnings appear, suggesting the hardware is healthy.
Conclusion and Follow‑Up
All indicators point to the IO subsystem functioning normally. The persistent alert was later traced to an incorrect Prometheus alert rule; fixing the rule cleared the warning. The earlier observation that swap was fully used despite ample free memory is noted for a future deep‑dive on safe swap reclamation.
Directly clearing swap cache is not recommended because swap holds pages that were evicted when memory was scarce; forcibly dropping them can cause unnecessary page‑ins later and may trigger further swapping if physical memory runs low again.
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
