Understanding Linux CPU Usage, Scheduling, and Performance Monitoring
This article explains how Linux reports CPU usage with tools like top, the meaning of the fields in /proc/stat, how utilization percentages are calculated, the concepts of run queues, load average, context switching, multi‑core scheduling, and how to use perf and taskset for deeper performance analysis.
During the campus recruitment season, many interviewees lack a solid understanding of Linux CPU usage details, so this article compiles common interview questions about Linux CPU scheduling and provides explanations.
Starting with the top command
Interviewers often ask what the numbers shown by top represent and how they are calculated.
How CPU utilization is calculated
The top output shows two percentages: the aggregated usage of all CPU cores and the per‑process usage. CPU time is divided into ticks (time slices). The file /proc/stat records the number of ticks spent in various states since boot.
# cat /proc/stat
cpu 8772231776 3 2071565293 180450735957 98308067 485278 400223124 0 0 0
cpu0 378973514 0 97049915 5442776752 21309455 2138 20192322 0 0 0
cpu1 385703831 0 96690059 5459587181 7568642 2528 13523191 0 0 0
...Each column corresponds to time spent in user, nice, system, idle, iowait, irq, softirq, steal, guest, and guest_nice states.
The calculation formula is:
CPU total time = user + nice + system + idle + iowait + irq + softirq + steal
CPU idle time = idle + iowait
CPU usage time = CPU total time – CPU idle time
CPU utilization = usage time / total time
When a process uses multiple cores, its reported CPU usage can exceed 100% because the denominator is the time of a single core.
Different kinds of CPU usage
The man page defines each field (user, nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice). Interviewers may ask for real‑world scenarios that cause specific fields to increase.
Run queue
A run queue holds processes waiting to be scheduled. Processes can be runnable (ready to run) or uninterruptible (e.g., waiting for I/O, shown as state D). The vmstat command shows how many processes are runnable or blocked in the run queue.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 565716 80270304 415776 3862448 0 0 0 4 0 5 94 0 0
...Load average
For a deeper dive on load average, see the linked article about Linux loadavg.
Context switching
When a core handles multiple processes, it repeatedly saves the current context and loads the next one. Excessive switches degrade performance because each switch incurs memory reads/writes and CPU cycles.
Switches happen via kernel scheduling or soft/hard interrupts.
Multi‑core performance issues
Processes may migrate between cores, which can flush caches and hurt performance. Affinity (set with taskset ) controls which cores a process may run on.
:~$ taskset -c -p 18232
pid 18232's current affinity list: 0-31
# taskset -p f 18232
pid 18232's current affinity mask: ffffffff
pid 18232's new affinity mask: f
# taskset -pc 18232
pid 18232's current affinity list: 0-3Using perf sched for deeper analysis
Record scheduler events:
[root@stretch:~]# perf sched record -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.190 MB perf.data (146 samples) ]Show per‑thread run‑queue latency:
perf sched latency
---------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at
---------------------------------------------------------------
watchdog/0:11 | 0.000 ms | 1 | avg: 0.051 ms | 0.051 ms | 413269.258775 s
...Map scheduler events to processes:
# perf sched map
*A0 993552.887633 secs A0 => perf:26596
*. A0 993552.887781 secs . => swapper:0
...Misconceptions about CPU utilization
High CPU usage (e.g., 90%) does not always mean the CPU is busy executing instructions; a large portion may be stalled waiting for memory I/O. Brendan Gregg’s article "CPU Utilization is Wrong" illustrates this with a breakdown of busy, idle, and stalled cycles.
Evaluating CPU stalls
Use perf stat -a -- sleep 10 and look at the "instructions per cycle" (IPC) metric. Higher IPC indicates better utilization. For a 4‑wide CPU, an IPC of 0.78 means roughly 19.5% of the CPU’s capacity is used.
perf stat -a -- sleep 10
641398.723351 task-clock (msec) # 64.116 CPUs utilized (100.00%)
379,651 context-switches # 0.592 K/sec (100.00%)
...
1,433,972,173 cycles # 2.236 GHz (75.02%)
1,118,336,688 instructions # 0.78 insns per cycle (75.01%)When IPC > 1.0, the workload is CPU‑bound; optimizations focus on code efficiency or faster CPUs. When IPC < 1.0, the workload is memory‑bound; optimizations focus on cache friendliness or faster memory.
Conclusion
Linux CPU scheduling is a complex system; this article provides an introductory overview to help readers explore deeper topics and apply the knowledge in real‑world performance troubleshooting.
NetEase Game Operations Platform
The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.