Operations 11 min read

Why Is Linux Load High? Decoding Load Average, CPU Usage & Process States

This article explains Linux process states, how the kernel calculates load average, the relationship between load and CPU utilization, and provides a systematic approach with tools and commands to identify resource bottlenecks, differentiate high‑load/high‑CPU scenarios from high‑load/low‑CPU cases, and pinpoint problematic processes and threads.

dbaplus Community

Jan 10, 2022

Why Is Linux Load High? Decoding Load Average, CPU Usage & Process States

Background

During load testing alerts about high CPU or high load are common. Understanding why these metrics spike and how to diagnose them requires knowledge of Linux process states, load average calculation, and CPU utilization breakdown.

Linux Process States

Since Linux 2.6 the kernel defines seven basic states, visible with ps:

D (TASK_UNINTERRUPTIBLE) : Uninterruptible sleep, usually waiting for I/O. Cannot be killed with kill -9 and does not consume CPU.

R (TASK_RUNNING) : Runnable or running on the CPU.

S (TASK_INTERRUPTIBLE) : Interruptible sleep, waiting for events such as sockets or semaphores. No CPU usage.

T / t : Stopped (signal‑induced) or traced (debugger) state; resources are released.

Z (EXIT_ZOMBIE) : Process has exited but the parent has not yet reaped it.

X (EXIT_DEAD) : Fully dead; normally not observable.

Load Average and CPU Utilization

Load Average

Linux load average counts both running tasks and tasks in uninterruptible sleep (I/O wait). The kernel code (simplified) is:

static unsigned long count_active_tasks(void)
{
    struct task_struct *p;
    unsigned long nr = 0;
    read_lock(&tasklist_lock);
    for_each_task(p) {
        if ((p->state == TASK_RUNNING) || (p->state & TASK_UNINTERRUPTIBLE))
            nr += FIXED_1;
    }
    read_unlock(&tasklist_lock);
    return nr;
}

static inline void calc_load(unsigned long ticks)
{
    unsigned long active_tasks; /* fixed‑point */
    static int count = LOAD_FREQ;
    count -= ticks;
    if (count < 0) {
        count += LOAD_FREQ;
        active_tasks = count_active_tasks();
        CALC_LOAD(avenrun[0], EXP_1,  active_tasks);
        CALC_LOAD(avenrun[1], EXP_5,  active_tasks);
        CALC_LOAD(avenrun[2], EXP_15, active_tasks);
    }
}

Therefore load average reflects overall system load: CPU + disk I/O + network I/O + other device waits. It is not equivalent to CPU utilization.

CPU Utilization

CPU time is divided into four main categories:

User time (us)

System time (sy)

Idle time (id)

Steal time (st)

Tools such as top further split these into eight fields (us, sy, ni, id, wa, hi, si, st). All except wa (I/O wait) and id indicate the CPU is doing work.

Resource and Bottleneck Analysis

High Load & High CPU

When both load and CPU are high, the load increase is driven by CPU consumption. Sub‑cases:

CPU sys high : Kernel time dominates; check context‑switch rates. Excessive involuntary switches suggest heavy pre‑emptive scheduling.

CPU si high : Soft‑interrupts dominate; common sources are network I/O (NET_TX, NET_RX) or scheduler activity (SCHED).

CPU us high : User‑space work dominates. Causes include CPU‑bound loops, memory‑induced Full GC, thread‑pool saturation, etc.

High Load & Low CPU

Load is high while CPU usage is low when many processes are stuck in uninterruptible sleep (TASK_UNINTERRUPTIBLE), typically waiting on disk or network I/O.

Troubleshooting Strategy

1. Locate Resource Bottleneck

Use global performance tools to get an overview: top, vmstat, tsar (historical)

Interrupt statistics: /proc/softirqs, /proc/interrupts I/O statistics: iostat,

dstat

2. Identify Hot Processes

Context switches per process: pidstat -w CPU usage per process: pidstat -u I/O per process: iotop, pidstat -d Zombies:

ps

3. Thread & Process Internal Analysis

Per‑thread context switches: pidstat -w -p [pid] Per‑thread CPU: pidstat -u -p [pid] Open file descriptors and I/O:

lsof

4. Hot Event & Method Analysis

Performance sampling: perf Java stack traces: jstack System call tracing: strace Network capture:

tcpdump

References

Linux Load Averages: Solving the Mystery – http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

What exactly is a load average? – http://linuxtechsupport.blogspot.com/2008/10/what-exactly-is-load-average.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance monitoring Linux cpu-utilization load-average process states System Bottleneck

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.