Operations 24 min read

Master Linux perf: From Basics to Advanced Profiling and Flame Graphs

This comprehensive guide introduces Linux perf, explains its core components, walks through essential commands, demonstrates real‑world use cases such as CPU, memory, and I/O analysis, shows how to generate flame graphs, and provides advanced tips and troubleshooting for accurate performance profiling on Linux systems.

Liangxu Linux

Jul 23, 2023

Master Linux perf: From Basics to Advanced Profiling and Flame Graphs

Introduction

Linux perf is a built‑in kernel tool for performance analysis. It can monitor hardware counters, kernel events, and user‑space activities, enabling developers and administrators to locate bottlenecks, improve stability, and understand system behavior.

Overview of perf

Perf events – the basic measurement units (hardware, software, or tracepoints).

Perf counters – devices that count event occurrences.

Perf command‑line interface – sub‑commands such as stat, record, report, top, bench, trace, etc.

Perf data storage – files (default perf.data) that hold collected samples.

Perf analyzer – generates detailed reports and visualizations.

Basic commands and usage

perf list – show all available events. $ perf list perf stat – collect and display counter statistics for a command or the whole system.

$ perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses -- ./my_program

perf record – record events to perf.data for later analysis. $ perf record -g ./my_program perf report – read perf.data and present a searchable report. $ perf report perf top – real‑time view of hottest functions. $ perf top perf bench – built‑in benchmarks (e.g., memory bandwidth). $ perf bench mem memcpy perf trace – trace system calls and I/O events.

$ perf trace -e block:block_rq_issue,block:block_rq_complete -- ./my_program

Practical applications and cases

CPU analysis with perf stat reveals cycles, instructions, cache misses, and branch mispredictions. Memory analysis with perf mem record/report uncovers access patterns. I/O profiling via perf trace tracks block requests. Combining perf record and perf report produces call‑graph reports for software tuning. perf top helps locate hot kernels in real time. Benchmarks with perf bench evaluate hardware performance.

Generating flame graphs

Install the FlameGraph scripts:

$ git clone https://github.com/brendangregg/FlameGraph

Record data and collapse stacks (replace pid with the target process ID):

sudo perf record -F 99 -p <code>pid</code> -g -- sleep 30

perf script > out.perf

FlameGraph/stackcollapse-perf.pl out.perf > out.folded

FlameGraph/flamegraph.pl out.folded > out.svg

Open out.svg to explore function‑level hotspots.

Advanced techniques and practices

Custom events: perf stat -e rNNN -- ./my_program where rNNN is a raw event code.

Event groups:

perf stat -e '{cycles,instructions},{cache-references,cache-misses}' -- ./my_program

Multi‑core analysis: -C to select CPUs, e.g., perf stat -C 0-3 -e cycles,instructions -- ./my_program.

Long‑term monitoring: perf record -a -F 100 -g -- sleep 86400 (records system‑wide for 24 h).

Container usage: run perf with privileged access, e.g.,

docker run --privileged -v /usr/bin/perf:/usr/bin/perf -it my_image /bin/bash

Integration with FlameGraph for interactive visualizations:

perf record -g -- ./my_program | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg

Common problems and solutions

Installation : on Debian‑based systems install linux-tools-$(uname -r) (or linux-tools-common and linux-tools-generic for WSL).

Unsupported events : verify kernel and hardware support; if necessary lower the sampling rate or choose a different event set.

Unreadable reports : compile programs with -g to include debug symbols and provide the appropriate vmlinux or symbol files.

Virtualization / container compatibility : ensure the host kernel exposes perf counters and grant the container privileged access or appropriate capabilities.

Summary and outlook

Perf provides direct hardware counter access, a rich set of events, and extensive analysis capabilities for CPU, memory, and I/O profiling. Future directions include richer visual interfaces, broader architecture support (ARM, RISC‑V), AI‑assisted anomaly detection, and deeper integration with cloud‑native environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux Benchmark system-monitoring perf flamegraph

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.