How to Diagnose and Reduce CPU Context Switches on Linux
This guide explains when CPU context switches happen, how to monitor them with vmstat and pidstat, interprets key metrics such as cswch and nvcswch, and provides step‑by‑step analysis techniques for identifying and troubleshooting CPU performance issues on Linux systems.
When does a CPU context switch occur?
The scheduler’s time slice expires; the running process is pre‑empted and another process is scheduled.
A process voluntarily yields the CPU, e.g., by calling sleep() or waiting for I/O.
A higher‑priority task pre‑empts a lower‑priority one.
The process becomes blocked because required resources are unavailable.
Hardware events such as interrupts or exceptions invoke the kernel and cause a switch.
Observing CPU context switches
The vmstat utility reports system‑wide context‑switch activity. Running it with a short interval (for example, three seconds) provides a live view:
# vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 477767936 883712 9970296 0 0 0 2 0 0 0 0 99 0 0
0 0 0 477766688 883716 9970120 0 0 0 12 380 754 0 0 100 0 0
0 0 0 477770016 883716 9970292 0 0 0 0 1956 4706 1 1 98 0 0r: Number of processes in running or runnable state. b: Number of processes in uninterruptible sleep. in: Interrupts per second. cs: Context switches per second.
For per‑process details, use pidstat:
# pidstat -w 3
Linux 4.15.0-58-generic (host) 11/26/2025 _x86_64_ (64 CPU)
09:20:52 PM UID PID cswch/s nvcswch/s Command
09:20:55 PM 0 8 0.33 0.00 ksoftirqd/0
09:20:55 PM 0 9 12.83 0.00 rcu_sched
09:20:55 PM 0 12 0.33 0.00 watchdog/0cswch – voluntary switches (process yields CPU, e.g., waiting for I/O or sleeping).
nvcswch – involuntary switches caused by pre‑emption or higher‑priority scheduling; a high value indicates heavy load.
Analyzing CPU performance problems
Begin with uptime or top to detect abnormal load average or %CPU spikes. Then drill down:
Inspect vmstat. If the r column exceeds the number of CPU cores, processes are contending for CPU, which typically raises the cs counter and increases user‑mode ( us) and kernel‑mode ( sy) CPU usage.
Identify the offending task with pidstat. Look for processes with unusually high cswch or nvcswch. To see thread‑level activity, add the -t flag:
# pidstat -wt 3
Linux 4.15.0-58-generic (host) 11/27/2025 _x86_64_ (64 CPU)
05:17:45 PM UID TGID TID cswch/s nvcswch/s Command
05:17:48 PM 0 8 - 10.79 0.00 ksoftirqd/0
05:17:48 PM 0 9 - 107.62 0.00 rcu_sched
05:17:48 PM 109 4695 - 1.27 0.00 ntpdTID is the thread ID; TGID matches the process ID.
Monitor the interrupt rate ( in) from vmstat for deeper insight.
Investigating interrupt anomalies
The in column of vmstat shows total interrupts per second. Real‑time changes can be observed with: # watch -d /proc/interrupts Key interrupt lines to interpret:
NMI – non‑maskable interrupts; a non‑zero value usually signals hardware failure (check dmesg).
LOC – local timer interrupts; large per‑CPU disparities may indicate scheduler or timer problems.
RES – rescheduling interrupts; spikes suggest heavy load and frequent process migrations.
TLB – TLB shootdowns; abnormal growth can be caused by processes that frequently allocate or free memory (inspect with ps aux --sort=-%mem).
Other counters are typically hardware‑related; any non‑zero value should be cross‑checked with kernel logs.
By correlating load averages, vmstat metrics, per‑process pidstat data, and interrupt statistics, you can pinpoint the root cause of CPU performance degradation and apply targeted remediation.
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
