Diagnosing Linux CPU Context Switch Problems with vmstat and pidstat
This article explains how excessive Linux CPU context switches affect system performance and shows step‑by‑step how to monitor and analyze them using vmstat, pidstat, and sysbench, including interpreting voluntary versus involuntary switches and interrupt statistics.
In the previous article I discussed how Linux CPU context switches work, covering process, thread, and interrupt switches. This follow‑up explains how to analyze context‑switch issues.
Checking CPU Context Switches
Too many context switches waste CPU time saving and restoring registers, program counters, kernel stacks, and virtual memory, which can noticeably degrade performance. To inspect them you can use the vmstat tool.
vmstat
vmstatis a common system‑performance utility that reports memory usage and also CPU context‑switch and interrupt counts.
Example command:
vmstat 5The output columns of interest are: cs (context switch): number of switches per second. in (interrupt): number of interrupts per second. r (running | runnable): length of the run queue (processes ready or running). b (blocked): number of processes in uninterruptible sleep.
In the example the system shows 33 context switches, 25 interrupts, and both the run‑queue and blocked counts are 0, indicating an idle system.
pidstat
While vmstat gives a system‑wide view, pidstat provides per‑process details. Adding the -w option shows each process’s context‑switch statistics.
# Output interval is 5
$ pidstat -w 5
Linux 4.15.0 (ubuntu) 09/23/18 _x86_64_ (2 CPU)
08:18:26 UID PID cswch/s nvcswch/s Command
08:18:31 0 1 0.20 0.00 systemd
08:18:31 0 8 5.40 0.00 rcu_sched
...The two columns to note are cswch (voluntary switches per second) and nvcswch (involuntary switches per second).
Voluntary context switch : occurs when a process cannot obtain needed resources (e.g., I/O or memory shortage).
Involuntary context switch : occurs when a time slice expires and the scheduler forces a switch, common under heavy CPU contention.
Case Study
To see what constitutes a normal switch rate, we use sysbench (a multithreaded benchmark) to generate load. First, run vmstat on an idle system:
The idle output shows 35 context switches, 19 interrupts, and both r and b at 0.
Next, run a sysbench test with ten threads for 300 seconds:
$ sysbench --threads=10 --max-time=300 threads runAfter the load starts, vmstat shows a dramatic increase:
The cs column jumps from 35 to 139 000 switches. Other observations: r: run‑queue length rises to 8. us + sy: user and system CPU usage together reach 100 %, with system usage at 84 %. in: interrupts climb to about 10 000, indicating interrupt handling pressure.
These metrics reveal a long run‑queue and heavy CPU usage caused by the benchmark.
Further analysis with pidstat -w -u 1 shows that sysbench consumes 100 % CPU, but many context switches also come from other processes such as kernel workers ( kworker) and sshd, especially involuntary switches.
# 1 means output interval is 1 second
# -w: output process switching index
# -u: output CPU usage index
$ pidstat -w -u 1
08:06:33 UID PID %usr %system %guest %wait %CPU CPU Command
08:06:34 0 10488 30.00 100.00 0.00 0.00 100.00 0 sysbench
...Note: By default pidstat shows process‑level switches; add -t to see thread‑level switches.
Interrupts
To investigate the high interrupt count, examine /proc/interrupts:
# -d: Highlight the change area
$ watch -d cat /proc/interrupts
CPU0 CPU1
...
RES: 2450431 5279697 Rescheduling interrupts
...The fastest‑changing entry is the RES (rescheduling) interrupt, which wakes idle CPUs to schedule new tasks, confirming that excessive scheduling is the root cause.
What Is a Normal Switch Rate?
Typical stable systems see a few hundred to ten thousand switches per second. Values consistently above 10000 or a rapid increase usually signal performance problems.
Conclusion
By examining the type and frequency of context switches and interrupts, you can pinpoint whether the issue stems from resource‑waiting (voluntary switches), CPU contention (involuntary switches), or interrupt overload.
Many voluntary switches suggest processes are blocked on I/O or other resources.
Many involuntary switches indicate CPU bottlenecks and heavy contention.
Rising interrupt counts point to kernel‑level scheduling pressure; inspect /proc/interrupts for details.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
