Operations 10 min read

Understanding Linux Software Performance Events and Using perf for Profiling

This article explains how software performance events in the Linux kernel affect application performance, demonstrates how to list, count, and trace these events with perf, and shows how to visualize the results using FlameGraph to pinpoint bottlenecks such as cache misses, context switches, and page faults.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding Linux Software Performance Events and Using perf for Profiling

Our applications run on top of language runtimes, operating system kernels, and hardware, and performance bottlenecks may stem from the application code or lower‑level software and hardware layers.

Key hardware metrics like CPI (cycles per instruction) and cache hit rate influence performance, but the kernel also provides software performance events that developers can observe.

1. Software Performance Event List

Using # perf list sw you can see the predefined software events supported by the system, such as alignment-faults, context-switches, cpu-migrations, emulation-faults, major-faults, minor-faults, page-faults, and task-clock.

alignment-faults

Alignment faults occur when the CPU accesses a memory address that is not properly aligned, causing extra memory I/O and degrading performance.

context-switches

Each process context switch costs 3‑5 µs, wasting CPU cycles and reducing cache locality, which raises CPI.

cpu-migrations

Frequent migration of a task between CPUs hurts cache affinity; the kernel’s wake_affine mechanism tries to keep a task on the same core.

emulation-faults

These arise when running x86 binaries under QEMU, due to differences between the emulator and real hardware.

page-faults

Page faults (major and minor) happen when a process accesses memory that has not yet been mapped to a physical page; major faults trigger disk I/O, causing larger performance impact.

2. Counting Software Performance Events

To see how many times each event occurs, use # perf stat -e alignment-faults,context-switches,cpu-migrations,emulation-faults,page-faults,major-faults,minor-faults sleep 5 . The command reports counts for the whole system or, with a specific program or PID, for a single process.

3. Tracing Event Call Stacks

When an event occurs frequently, you may want to know which call paths cause it. # perf record -a -g -e context-switches sleep 30 records both user and kernel stacks for the specified event.

Adjust the sampling frequency with # perf record -F 100 … to limit overhead and file size.

After recording, # perf script displays the sampled stacks, and # perf report provides a summary.

4. Visualizing with FlameGraph

Brendan Gregg’s FlameGraph project can turn the perf data into an intuitive SVG flame graph. Clone the repository and run:

# git clone https://github.com/brendangregg/FlameGraph.git
# perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > out.svg

The stackcollapse-perf.pl script collapses each call stack into a single line with a sample count, and flamegraph.pl renders the SVG.

By examining the flame graph for events like context‑switches, you can quickly identify the hot call paths responsible for the most overhead, and the same approach applies to other events such as page faults or CPU migrations.

Sharing these low‑level performance tuning techniques helps developers improve the efficiency of their applications.

LinuxCPUperfcontext switchesflamegraphkernel profilingsoftware performance events
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.