Why High Load Doesn’t Always Mean High CPU: Linux Performance Deep Dive
Understanding Linux load average and CPU utilization, this guide explains process states, how load is calculated, the difference between load and CPU usage, common bottlenecks, and practical troubleshooting steps using tools like top, vmstat, pidstat, iostat, and perf to pinpoint performance issues.
Background Knowledge
Linux processes after kernel 2.6 have seven basic states: D (uninterruptible sleep), R (running), S (interruptible sleep), T (stopped), t (traced), X (dead), Z (zombie). These correspond to the columns shown by the ps command.
D (TASK_UNINTERRUPTIBLE) : Uninterruptible sleep, usually caused by I/O wait; cannot be killed with kill -9 and does not consume CPU.
R (TASK_RUNNING) : Runnable or running on the CPU.
S (TASK_INTERRUPTIBLE) : Interruptible sleep, waiting for events such as socket connections or semaphores; does not consume CPU.
T/t (__TASK_STOPPED & __TASK_TRACED) : Stopped (signal‑stop) or traced (debugger) state; resources are released.
Z (EXIT_ZOMBIE) : Process has exited but parent has not yet reaped it.
X (EXIT_DEAD) : Final dead state, rarely observed.
Load Average & CPU Utilization
Load average and CPU usage are the two most direct performance metrics, but they are calculated differently and are not equivalent.
Load Average is often misunderstood as the number of processes running or waiting for CPU. In Linux, the calculation also includes tasks in uninterruptible sleep (I/O wait). The kernel code shows that both TASK_RUNNING and TASK_UNINTERRUPTIBLE contribute to the load count.
static unsigned long count_active_tasks(void) {
struct task_struct *p;
unsigned long nr = 0;
read_lock(&tasklist_lock);
for_each_task(p) {
if ((p->state == TASK_RUNNING) || (p->state & TASK_UNINTERRUPTIBLE))
nr += FIXED_1;
}
read_unlock(&tasklist_lock);
return nr;
}Thus, Linux load average reflects overall system load: CPU + disk I/O + network I/O + other device I/O, and cannot be equated with CPU utilization (unlike some Unix systems where load only represents CPU).
CPU Utilization is typically the sum of user time and system time. CPU time is divided into four main categories: user, system, idle, and steal. Tools further split these into eight categories (us, sy, ni, id, wa, hi, si, st) as shown by top.
Resource & Bottleneck Analysis
Different combinations of high/low load and CPU indicate distinct bottlenecks:
High Load & High CPU : Load increase is driven by CPU usage. Sub‑cases:
CPU sys high : Kernel consumes most CPU; check context switches. CPU si high : Soft‑interrupts dominate (e.g., NET_TX, NET_RX, SCHED). CPU us high : User‑space processes dominate (e.g., CPU‑bound loops, memory‑induced GC, thread‑pool saturation).
High Load & Low CPU : Many tasks are in uninterruptible sleep (I/O wait). Identify whether the wait is disk or network.
Investigation Strategy
The troubleshooting workflow consists of four stages:
Resource Bottleneck Location : Use global performance tools (top, vmstat, tsar) and inspect interrupts (/proc/softirqs, /proc/interrupts) and I/O (iostat, dstat).
Hot Process Identification : After locating the bottleneck, find the specific processes consuming resources. Tools: pidstat -w (context switches), pidstat -u (CPU), iotop / pidstat -d (I/O), ps (zombie processes).
Thread & Process Internal Resource Location : Drill down into a particular PID. Use pidstat -w -p [pid], pidstat -u -p [pid], and lsof for I/O.
Hot Event & Method Analysis : Capture stack traces or dumps of hot threads. Tools: perf, jstack (with ps -Lp or pidstat -p), strace, tcpdump for network I/O.
References
Linux Load Averages: Solving the Mystery – brendangregg.com
What exactly is a load average? – linuxtechsupport.blogspot.com
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
