Why Does Linux Load Spike? Deep Dive into Load Average Calculation & Troubleshooting
During high‑traffic events like Double‑11, Linux systems often see load averages surge, affecting response times and command execution; this article explains what load averages represent, how the kernel computes them using exponential weighted moving averages, and outlines common causes and systematic methods for root‑cause analysis.
What Is Load
Linux system load averages measure the demand of tasks (processes or threads) on CPU, memory, I/O, etc., averaged over 1, 5, and 15 minutes. The values are recorded in /proc/loadavg and read by tools such as uptime and top.
If the load is close to 0, the system is idle.
If the 1‑minute average exceeds the 5‑ or 15‑minute averages, load is increasing.
If the 1‑minute average is lower than the 5‑ or 15‑minute averages, load is decreasing.
When any average exceeds the number of CPU cores, performance problems are likely.
How Load Is Calculated
Core Algorithm
The kernel uses an Exponential Weighted Moving Average (EWMA):
#define EXP_1 1884 /* 1/exp(5sec/1min) */
#define EXP_5 2014 /* 1/exp(5sec/5min) */
#define EXP_15 2037 /* 1/exp(5sec/15min) */For each interval, the kernel updates the load with:
/*
* a1 = a0 * e + a * (1 - e)
*/
static inline unsigned long calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
unsigned long newload;
// FIXED_1 = 2048
newload = load * exp + active * (FIXED_1 - exp);
if (active >= load)
newload += FIXED_1-1;
return newload / FIXED_1;
}Here a0 is the previous load, a1 the current load, e a constant (derived from the natural number e), and a the number of active (runnable + uninterruptible) tasks.
Calculation Process
The kernel performs two steps periodically:
Each CPU updates a global counter with its runnable and uninterruptible tasks.
A designated CPU (the timer CPU) computes the three load values from that counter.
The flow is illustrated below:
Common Causes of High Load
1. Periodic Spikes
Sometimes a kernel bug related to the load sampling frequency (LOAD_FREQ) causes regular spikes; this was fixed in kernels ali2016, ali3000, and ali4000.
2. I/O Issues
Disk Bottlenecks
High IOPS or bandwidth can block many threads in uninterruptible state. Tools like iostat -dx 1 and vmstat reveal elevated b (blocked) and iowait values.
Cloud Disk Anomalies
Cloud disks may show 100 % I/O utilization, indicating a persistent queue of unfinished requests, which can stall both kernel and application threads.
JBD2 Bugs
Failures in the ext4 journal daemon (jbd2) can block all disk I/O, pushing many tasks into uninterruptible state.
3. Memory Issues
Memory Reclamation
Aggressive memory reclaim can stall tasks until reclamation finishes, raising load and CPU usage.
Memory Bandwidth Contention
Beyond capacity, memory bandwidth can become a bottleneck; specialized tools (e.g., aprof) are needed to observe it.
4. Locks
Spin‑locks in critical kernel paths (especially networking) or held mutexes can cause tasks to wait in D (uninterruptible) state, inflating load.
5. User‑Space CPU
When load spikes are driven by legitimate user‑space work, you’ll see high user CPU, increased run queue length, and higher scheduler delay.
Root‑Cause Analysis Techniques
Runnable‑Type Load
Usually tied to increased business traffic or code bugs (e.g., infinite loops). On‑CPU profiling tools like perf or Alibaba’s ali-diagnose help locate hot spots.
Uninterruptible‑Type Load
Identify tasks stuck in D state via /proc/${pid}/stat (third field) and examine /proc/${pid}/stack for the waiting location. Example screenshots:
If D tasks are transient, delayed analysis using kernel probes (systemtap, kprobe, eBPF) is required; Alibaba’s ali-diagnose provides such delay analyses.
Conclusion
The Linux kernel’s load average is a concise indicator of runnable and uninterruptible task pressure. By examining both dimensions, checking I/O, memory, locking, and using appropriate tracing tools, you can reliably pinpoint the root cause of load spikes and restore system stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
