Operations 15 min read

Understanding Linux CPU Usage, Scheduling, and Performance Monitoring

This article explains how Linux reports CPU usage with tools like top, the meaning of the fields in /proc/stat, how utilization percentages are calculated, the concepts of run queues, load average, context switching, multi‑core scheduling, and how to use perf and taskset for deeper performance analysis.

NetEase Game Operations Platform

Nov 2, 2019

Understanding Linux CPU Usage, Scheduling, and Performance Monitoring

During the campus recruitment season, many interviewees lack a solid understanding of Linux CPU usage details, so this article compiles common interview questions about Linux CPU scheduling and provides explanations.

Starting with the top command

Interviewers often ask what the numbers shown by top represent and how they are calculated.

How CPU utilization is calculated

The top output shows two percentages: the aggregated usage of all CPU cores and the per‑process usage. CPU time is divided into ticks (time slices). The file /proc/stat records the number of ticks spent in various states since boot.

# cat /proc/stat
cpu  8772231776 3 2071565293 180450735957 98308067 485278 400223124 0 0 0
cpu0 378973514 0 97049915 5442776752 21309455 2138 20192322 0 0 0
cpu1 385703831 0 96690059 5459587181 7568642 2528 13523191 0 0 0
...

Each column corresponds to time spent in user, nice, system, idle, iowait, irq, softirq, steal, guest, and guest_nice states.

The calculation formula is:

CPU total time = user + nice + system + idle + iowait + irq + softirq + steal

CPU idle time = idle + iowait

CPU usage time = CPU total time – CPU idle time

CPU utilization = usage time / total time

When a process uses multiple cores, its reported CPU usage can exceed 100% because the denominator is the time of a single core.

Different kinds of CPU usage

The man page defines each field (user, nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice). Interviewers may ask for real‑world scenarios that cause specific fields to increase.

Run queue

A run queue holds processes waiting to be scheduled. Processes can be runnable (ready to run) or uninterruptible (e.g., waiting for I/O, shown as state D). The vmstat command shows how many processes are runnable or blocked in the run queue.

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 565716 80270304 415776 3862448   0    0     0     4    0   5 94  0  0
 ...

Load average

For a deeper dive on load average, see the linked article about Linux loadavg.

Context switching

When a core handles multiple processes, it repeatedly saves the current context and loads the next one. Excessive switches degrade performance because each switch incurs memory reads/writes and CPU cycles.

Switches happen via kernel scheduling or soft/hard interrupts.

Multi‑core performance issues

Processes may migrate between cores, which can flush caches and hurt performance. Affinity (set with taskset) controls which cores a process may run on.

:~$ taskset -c -p 18232
pid 18232's current affinity list: 0-31
# taskset -p f 18232
pid 18232's current affinity mask: ffffffff
pid 18232's new affinity mask: f
# taskset -pc 18232
pid 18232's current affinity list: 0-3

Using perf sched for deeper analysis

Record scheduler events:

[root@stretch:~]# perf sched record -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.190 MB perf.data (146 samples) ]

Show per‑thread run‑queue latency:

perf sched latency
---------------------------------------------------------------
Task                | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at
---------------------------------------------------------------
watchdog/0:11       |   0.000 ms |    1     | avg: 0.051 ms    | 0.051 ms         | 413269.258775 s
...

Map scheduler events to processes:

# perf sched map
*A0           993552.887633 secs A0 => perf:26596
*.            A0           993552.887781 secs .  => swapper:0
...

Misconceptions about CPU utilization

High CPU usage (e.g., 90%) does not always mean the CPU is busy executing instructions; a large portion may be stalled waiting for memory I/O. Brendan Gregg’s article "CPU Utilization is Wrong" illustrates this with a breakdown of busy, idle, and stalled cycles.

Evaluating CPU stalls

Use perf stat -a -- sleep 10 and look at the "instructions per cycle" (IPC) metric. Higher IPC indicates better utilization. For a 4‑wide CPU, an IPC of 0.78 means roughly 19.5% of the CPU’s capacity is used.

perf stat -a -- sleep 10
 641398.723351 task-clock (msec) # 64.116 CPUs utilized (100.00%)
 379,651 context-switches # 0.592 K/sec (100.00%)
 ...
 1,433,972,173 cycles # 2.236 GHz (75.02%)
 1,118,336,688 instructions # 0.78 insns per cycle (75.01%)

When IPC > 1.0, the workload is CPU‑bound; optimizations focus on code efficiency or faster CPUs. When IPC < 1.0, the workload is memory‑bound; optimizations focus on cache friendliness or faster memory.

Conclusion

Linux CPU scheduling is a complex system; this article provides an introductory overview to help readers explore deeper topics and apply the knowledge in real‑world performance troubleshooting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring performance Ops Linux Scheduling CPU perf

Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.