Master Linux perf: From Basics to Advanced Profiling and Flame Graphs
This comprehensive guide introduces Linux perf, explains its core components, walks through essential commands, demonstrates real‑world use cases such as CPU, memory, and I/O analysis, shows how to generate flame graphs, and provides advanced tips and troubleshooting for accurate performance profiling on Linux systems.
Introduction
Linux perf is a built‑in kernel tool for performance analysis. It can monitor hardware counters, kernel events, and user‑space activities, enabling developers and administrators to locate bottlenecks, improve stability, and understand system behavior.
Overview of perf
Perf events – the basic measurement units (hardware, software, or tracepoints).
Perf counters – devices that count event occurrences.
Perf command‑line interface – sub‑commands such as stat, record, report, top, bench, trace, etc.
Perf data storage – files (default perf.data) that hold collected samples.
Perf analyzer – generates detailed reports and visualizations.
Basic commands and usage
perf list – show all available events. $ perf list perf stat – collect and display counter statistics for a command or the whole system.
$ perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses -- ./my_programperf record – record events to perf.data for later analysis. $ perf record -g ./my_program perf report – read perf.data and present a searchable report. $ perf report perf top – real‑time view of hottest functions. $ perf top perf bench – built‑in benchmarks (e.g., memory bandwidth). $ perf bench mem memcpy perf trace – trace system calls and I/O events.
$ perf trace -e block:block_rq_issue,block:block_rq_complete -- ./my_programPractical applications and cases
CPU analysis with perf stat reveals cycles, instructions, cache misses, and branch mispredictions. Memory analysis with perf mem record/report uncovers access patterns. I/O profiling via perf trace tracks block requests. Combining perf record and perf report produces call‑graph reports for software tuning. perf top helps locate hot kernels in real time. Benchmarks with perf bench evaluate hardware performance.
Generating flame graphs
Install the FlameGraph scripts:
$ git clone https://github.com/brendangregg/FlameGraphRecord data and collapse stacks (replace pid with the target process ID):
sudo perf record -F 99 -p <code>pid</code> -g -- sleep 30 perf script > out.perf FlameGraph/stackcollapse-perf.pl out.perf > out.folded FlameGraph/flamegraph.pl out.folded > out.svgOpen out.svg to explore function‑level hotspots.
Advanced techniques and practices
Custom events: perf stat -e rNNN -- ./my_program where rNNN is a raw event code.
Event groups:
perf stat -e '{cycles,instructions},{cache-references,cache-misses}' -- ./my_program.
Multi‑core analysis: -C to select CPUs, e.g., perf stat -C 0-3 -e cycles,instructions -- ./my_program.
Long‑term monitoring: perf record -a -F 100 -g -- sleep 86400 (records system‑wide for 24 h).
Container usage: run perf with privileged access, e.g.,
docker run --privileged -v /usr/bin/perf:/usr/bin/perf -it my_image /bin/bash.
Integration with FlameGraph for interactive visualizations:
perf record -g -- ./my_program | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg.
Common problems and solutions
Installation : on Debian‑based systems install linux-tools-$(uname -r) (or linux-tools-common and linux-tools-generic for WSL).
Unsupported events : verify kernel and hardware support; if necessary lower the sampling rate or choose a different event set.
Unreadable reports : compile programs with -g to include debug symbols and provide the appropriate vmlinux or symbol files.
Virtualization / container compatibility : ensure the host kernel exposes perf counters and grant the container privileged access or appropriate capabilities.
Summary and outlook
Perf provides direct hardware counter access, a rich set of events, and extensive analysis capabilities for CPU, memory, and I/O profiling. Future directions include richer visual interfaces, broader architecture support (ARM, RISC‑V), AI‑assisted anomaly detection, and deeper integration with cloud‑native environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
