Operations 11 min read

Why Is CPU IO Wait So High? A Step‑by‑Step Linux IO Bottleneck Investigation

The article walks through a real‑world Linux alert showing a 243% CPU IO Wait, explains how to interpret the metric, gathers system basics, uses iostat, iotop, pidstat and mpstat to probe the IO subsystem, checks disk health, and ultimately discovers the root cause was a mis‑configured Prometheus alert rule.

Tech Stroll Journey

Nov 10, 2025

Why Is CPU IO Wait So High? A Step‑by‑Step Linux IO Bottleneck Investigation

This piece is part of a new "Performance Optimization in Practice" series that focuses on troubleshooting production‑level issues. An alert from the monitoring platform reported a CPU IO Wait of 243.23%, which is impossible for a single CPU core and indicates a severe IO wait problem.

First, the raw alert JSON is shown:

{
  "message": "节点 node1 CPU IO Wait 当前是 243.2330043859647%",
  "suggestion_i18n": {
    "en": "检查CPU负载",
    "zh": "检查CPU负载"
  }
}

The high IO Wait suggests that the system spends most of its time waiting for disk or network IO while the CPU itself is idle. The investigation therefore shifts to the IO subsystem.

Gather Basic System Information

# free -wh
              total        used        free      shared    buffers       cache   available
Mem:           125G         28G         14G         34G        352M        82G         79G
Swap:           29G         29G          0B

# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G   63G     0 100% /dev
tmpfs            13G  1.3G    12G  11% /run
/dev/sda1        46G   19G    26G  42% /
... (other mounts omitted)

# uptime
 20:18:42 up 1620 days,  3:34,  2 users,  load average: 0.66, 0.84, 1.32

The load average is low, disk space is ample, and 14 GB of free memory remains, but the swap partition is completely used—a contradiction that hints at lingering swapped‑out pages after a previous memory pressure event.

Probe IO and CPU Usage

Using iostat to examine device statistics:

# iostat -x 2
Linux 4.15.0-58-generic (zw1bnr02n00)  11/10/2025  _x86_64_  (64 CPU)

avg-cpu:  %user %nice %system %iowait %steal %idle
          1.31   0.00   0.10   0.04   0.00  98.54

Device:            rrqm/s wrqm/s   r/s   w/s  rkB/s  wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
sda                 0.00   9.80  0.49 35.70  29.74 994.55   56.60    0.01   0.29   2.62   0.26   0.07  0.27

avg-cpu:  %user %nice %system %iowait %steal %idle
          1.07   0.00   0.07   0.00   0.00  98.86

Device:            rrqm/s wrqm/s   r/s   w/s  rkB/s  wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
sda                 0.00   3.00  0.00 21.50   0.00   98.00    9.12    0.00   0.00   0.00   0.00   0.00  0.00

The %util values are low, indicating the disk is not saturated, and await is not significantly higher than svctm, so there is no massive IO queue.

Next, iotop -o is suggested to spot processes with active IO, followed by pidstat -d 1 to view per‑process read/write rates:

# pidstat -d 1
Linux 4.15.0-58-generic (zw1bnr02n00)  11/10/2025  _x86_64_  (64 CPU)

09:26:29 PM   UID   PID  kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
09:26:30 PM    0  854    0.00   19.23      0.00      0 jbd2/sda1-8
09:26:30 PM  110 5399    0.00    7.69      0.00      0 mysqld
09:26:30 PM    0 21441   0.00   19.23      0.00      0 prometheus
09:26:30 PM    0 40237   0.00   76.92      0.00      0 java
... (subsequent lines omitted)

These outputs show that MySQL, Prometheus and a Java process are generating write traffic, but the overall IO load remains modest.

To verify CPU activity across all cores, mpstat -P ALL 1 5 is run:

# mpstat -P ALL 1 1
Linux 4.15.0-58-generic (zw1bnr02n00)  11/10/2025  _x86_64_  (64 CPU)

09:36:15 PM  all   %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:36:16 PM  all   0.05 0.00  0.02   0.02   0.00 0.00   0.00   0.00   0.00 99.92
... (per‑CPU lines omitted)

The CPU usage is negligible, confirming that the system is not CPU‑bound.

Check for Kernel IO Errors and Disk Health

# dmesg -T | grep -i error   # Look for kernel I/O errors
# smartctl --all /dev/sda      # Show SMART data for the disk

No error messages or SMART warnings appear, suggesting the hardware is healthy.

Conclusion and Follow‑Up

All indicators point to the IO subsystem functioning normally. The persistent alert was later traced to an incorrect Prometheus alert rule; fixing the rule cleared the warning. The earlier observation that swap was fully used despite ample free memory is noted for a future deep‑dive on safe swap reclamation.

Directly clearing swap cache is not recommended because swap holds pages that were evicted when memory was scarce; forcibly dropping them can cause unnecessary page‑ins later and may trigger further swapping if physical memory runs low again.