Using Flame Graphs for CPU Performance Analysis with perf
This article explains how to generate and interpret flame graphs for CPU performance profiling on Linux, covering the use of perf for sampling, the underlying kernel mechanisms, and the processing steps with Brendan Gregg's FlameGraph scripts to visualize hot functions.
The article introduces flame graphs as a powerful tool for analyzing CPU hotspots, starting with a simple C demo program and showing how to compile and record performance data using gcc -o main main.c and perf record -g ./main .
It then guides the reader to clone Brendan Gregg's FlameGraph repository and process the generated perf.data file with a pipeline of commands: perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > out.svg , producing an SVG flame graph.
The next section details perf sampling, explaining the perf record command options such as event selection ( -e cache-misses ), sampling frequency ( -F 100 ), CPU core selection ( -C 0,1 ), and system-wide collection ( -a -g ). It also shows how to inspect the raw data with perf script and perf report -n --stdio .
It then describes the kernel side of perf, focusing on the perf_event_open system call, the registration of callbacks, and the handling of samples in functions like perf_event_nmi_handler , perf_event_output_forward , and __perf_event_output . The article explains how call‑chain sampling is performed when PERF_SAMPLE_CALLCHAIN is enabled.
After sampling, the article explains the FlameGraph processing steps: the stackcollapse-perf.pl script aggregates call stacks into a collapsed format, dramatically reducing data size, and the flamegraph.pl script renders the aggregated data into an SVG where the width of each block reflects the number of samples.
Finally, the article summarizes that flame graphs provide an intuitive visual representation of CPU usage, relying on kernel perf events for data collection and Brendan Gregg's scripts for rendering, and notes that while flame graphs are sampled representations, they are sufficient for most performance‑optimization tasks.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.