Why Is Linux Load High? Decoding Load Average, CPU Usage & Process States
This article explains Linux process states, how the kernel calculates load average, the relationship between load and CPU utilization, and provides a systematic approach with tools and commands to identify resource bottlenecks, differentiate high‑load/high‑CPU scenarios from high‑load/low‑CPU cases, and pinpoint problematic processes and threads.
Background
During load testing alerts about high CPU or high load are common. Understanding why these metrics spike and how to diagnose them requires knowledge of Linux process states, load average calculation, and CPU utilization breakdown.
Linux Process States
Since Linux 2.6 the kernel defines seven basic states, visible with ps:
D (TASK_UNINTERRUPTIBLE) : Uninterruptible sleep, usually waiting for I/O. Cannot be killed with kill -9 and does not consume CPU.
R (TASK_RUNNING) : Runnable or running on the CPU.
S (TASK_INTERRUPTIBLE) : Interruptible sleep, waiting for events such as sockets or semaphores. No CPU usage.
T / t : Stopped (signal‑induced) or traced (debugger) state; resources are released.
Z (EXIT_ZOMBIE) : Process has exited but the parent has not yet reaped it.
X (EXIT_DEAD) : Fully dead; normally not observable.
Load Average and CPU Utilization
Load Average
Linux load average counts both running tasks and tasks in uninterruptible sleep (I/O wait). The kernel code (simplified) is:
static unsigned long count_active_tasks(void)
{
struct task_struct *p;
unsigned long nr = 0;
read_lock(&tasklist_lock);
for_each_task(p) {
if ((p->state == TASK_RUNNING) || (p->state & TASK_UNINTERRUPTIBLE))
nr += FIXED_1;
}
read_unlock(&tasklist_lock);
return nr;
}
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed‑point */
static int count = LOAD_FREQ;
count -= ticks;
if (count < 0) {
count += LOAD_FREQ;
active_tasks = count_active_tasks();
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
}
}Therefore load average reflects overall system load: CPU + disk I/O + network I/O + other device waits. It is not equivalent to CPU utilization.
CPU Utilization
CPU time is divided into four main categories:
User time (us)
System time (sy)
Idle time (id)
Steal time (st)
Tools such as top further split these into eight fields (us, sy, ni, id, wa, hi, si, st). All except wa (I/O wait) and id indicate the CPU is doing work.
Resource and Bottleneck Analysis
High Load & High CPU
When both load and CPU are high, the load increase is driven by CPU consumption. Sub‑cases:
CPU sys high : Kernel time dominates; check context‑switch rates. Excessive involuntary switches suggest heavy pre‑emptive scheduling.
CPU si high : Soft‑interrupts dominate; common sources are network I/O (NET_TX, NET_RX) or scheduler activity (SCHED).
CPU us high : User‑space work dominates. Causes include CPU‑bound loops, memory‑induced Full GC, thread‑pool saturation, etc.
High Load & Low CPU
Load is high while CPU usage is low when many processes are stuck in uninterruptible sleep (TASK_UNINTERRUPTIBLE), typically waiting on disk or network I/O.
Troubleshooting Strategy
1. Locate Resource Bottleneck
Use global performance tools to get an overview: top, vmstat, tsar (historical)
Interrupt statistics: /proc/softirqs, /proc/interrupts I/O statistics: iostat,
dstat2. Identify Hot Processes
Context switches per process: pidstat -w CPU usage per process: pidstat -u I/O per process: iotop, pidstat -d Zombies:
ps3. Thread & Process Internal Analysis
Per‑thread context switches: pidstat -w -p [pid] Per‑thread CPU: pidstat -u -p [pid] Open file descriptors and I/O:
lsof4. Hot Event & Method Analysis
Performance sampling: perf Java stack traces: jstack System call tracing: strace Network capture:
tcpdumpReferences
Linux Load Averages: Solving the Mystery – http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
What exactly is a load average? – http://linuxtechsupport.blogspot.com/2008/10/what-exactly-is-load-average.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
