Master Linux Performance: Using perf for Profiling and Optimization
Linux perf is a powerful, flexible profiling tool that lets developers and system administrators monitor hardware and software events, analyze CPU, memory, and I/O performance, generate flame graphs, and troubleshoot bottlenecks across single processes, containers, and multi‑core systems, with extensive commands and advanced techniques.
Introduction
Brief Introduction to Linux perf
Linux perf (performance analysis tool) is a powerful and flexible utility integrated into the Linux kernel that can detect and debug a wide range of performance problems. It can probe kernel performance events, hardware counters, and user‑space application events.
perf can profile applications to find bottlenecks and optimization points, improving system performance and stability. It supports many statistical and view modes, providing deep performance insight for developers and administrators.
Why Understanding perf Is Crucial for Linux Users
perf is essential because it enables performance optimization, system monitoring, problem localization, and deeper understanding of Linux internals. It helps developers locate hot spots, administrators monitor resources in real time, and both groups gain insight into kernel behavior.
Overview of the perf Tool
Origin and Development
perf originated from the growing need for performance analysis among kernel developers. In 2009 Ingo Molnar introduced perf, merging it into kernel 2.6.31. Since then it has been continuously improved, gaining support from the kernel community and hardware vendors.
Core Components of perf
perf consists of several core components:
perf events – the basic measurement units representing hardware, kernel, or user‑space events.
perf counters – devices that record the number of occurrences of events.
perf command‑line interface – the main way users interact with perf (sub‑commands such as stat, record, report, etc.).
perf data storage – files that hold collected performance data for later analysis.
perf analyzer – generates detailed reports that reveal performance bottlenecks.
Basic Commands and Usage
perf list – view available events $ perf list Common options: -F or --fields: specify output fields (e.g., -F event,desc). -H or --show-hierarchy: display events hierarchically. --help: show help. --filter: filter events by a string.
perf stat – view performance statistics $ perf stat [options] [command] Common options include: -e or --event: specify events (e.g., cache-misses). -p or --pid: monitor a specific PID. -t or --tid: monitor a specific TID. -a or --all-cpus: monitor all CPUs. -C or --cpu: select CPU list. -c or --count: set sampling period. -r or --repeat: repeat command. -d or --detailed: show detailed stats. -D or --delay: delay between outputs. -n or --null: run without collecting data. -o or --output: write data to a file. -A or --no-aggr: disable aggregation. --metric-only, --metricgroup, --metrics: metric‑related options. --per-socket, --per-core, --per-thread: aggregation scopes. --no-merge: do not merge PMU results.
perf record – record performance data $ perf record [options] [command] Data is saved to perf.data by default.
Common options: -e or --event: events to record. -p or --pid: target PID. -t or --tid: target TID. -a or --all-cpus: all CPUs. -C or --cpu: CPU list. -f or --overwrite: overwrite existing data. -c or --count: sampling period. -r or --real-time: set real‑time priority. -o or --output: output file. -g or --call-graph: record call‑graph (e.g., dwarf or fp).
Other options for context‑switch events, buffering, dry‑run, etc.
perf report – generate performance reports $ perf report [options] Reads perf.data and presents analysis in various formats. Useful options: -i or --input: input file (default perf.data). -F or --fields: fields to display. --sort: sort order. -T or --threads: show thread data. -m or --modules: show module data. -k or --vmlinux: path to kernel symbols. -f or --force: force parsing. -c or --comms: filter by command name. --dsos, -s or --symbols: filter DSOs or symbols. --percent-limit: hide low‑percentage entries. -P or --pretty: output format (raw, normal, etc.). --stdio or --tui or --gtk: output mode. -g or --call-graph: show call‑graph. --no-children, --no-demangle, --demangle, --filter, --max-stack: additional controls.
perf annotate – source‑level analysis $ perf annotate [options] [symbol] Shows instruction‑level performance data for each function, helping locate hot spots.
perf top – real‑time hot‑function view $ perf top [options] Continuously displays functions consuming the most CPU, aiding quick bottleneck identification.
perf bench – built‑in benchmarks $ perf bench [options] [subcommand] Provides benchmarks for memory, scheduling, file‑system, etc. Example: $ perf bench mem memcpy tests memory bandwidth.
Practical Applications and Cases
CPU Performance Analysis
$ perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses -- ./my_programCollects CPU‑level metrics for a program.
Memory Performance Analysis $ perf mem record ./my_program && perf mem report Records and reports memory‑access events.
I/O Performance Analysis
$ perf trace -e block:block_rq_issue,block:block_rq_complete -- ./my_programTraces block‑device requests to analyze I/O latency.
Software Performance Tuning $ perf record -g ./my_program && $ perf report Generates a call‑graph report highlighting hot functions.
System Bottleneck Localization $ perf top Real‑time view of CPU‑intensive functions.
Hardware Performance Evaluation $ perf bench mem memcpy Runs a memory copy benchmark to evaluate hardware.
Generating Flame Graphs
Steps:
Run perf to collect data: sudo perf record -F 99 -p <pid> -g -- sleep 30 Convert to script output: perf script > out.perf Clone FlameGraph repo: git clone https://github.com/brendangregg/FlameGraph Collapse stacks and generate SVG:
FlameGraph/stackcollapse-perf.pl out.perf > out.folded FlameGraph/flamegraph.pl out.folded > out.svgOpen out.svg to view the flame graph; width reflects time proportion, color depth indicates call‑stack depth.
Advanced Techniques and Practices
Customizing Performance Events $ perf stat -e rNNN -- ./my_program Uses raw hardware event code NNN. Multiple events can be grouped, e.g.:
$ perf stat -e '{cycles,instructions},{cache-references,cache-misses}' -- ./my_programCombining perf with Other Tools
$ perf record -g -- ./my_program | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svgGenerates an interactive flame graph.
Multi‑Core Performance Analysis $ perf stat -C 0-3 -e cycles,instructions -- ./my_program Monitors cores 0‑3 only.
Long‑Term Monitoring $ perf record -a -F 100 -g -- sleep 86400 Records system‑wide data for a full day.
Performance in Virtualization and Containers
$ docker run --privileged -v /usr/bin/perf:/usr/bin/perf -it my_image /bin/bashRuns perf inside a container with access to host counters.
Common Problems and Solutions
Installation Issues
On Debian‑based systems install with:
# Linux kernel tools
sudo apt-get install linux-tools-common linux-tools-$(uname -r)
# WSL (no real kernel)
sudo apt-get update
sudo apt-get install linux-tools-common linux-tools-genericData Collection Issues
Ensure the kernel supports the desired events; try different event sets or lower the sampling rate if data looks inaccurate.
Report Interpretation Issues
Compile programs with debugging symbols ( -g) and point perf to the correct symbol files. Learn the various output formats to customize reports.
Compatibility Issues
Check kernel and hardware support for perf. In virtualized or container environments, enable privileged access to performance counters. For non‑x86 architectures, consider building perf from source.
Summary and Outlook
perf is a versatile Linux performance analysis tool offering direct hardware counter access, support for many event types, comprehensive CPU/memory/I/O analysis, multi‑core and cloud‑native capabilities, and integration with other tools such as FlameGraph.
Future directions may include richer visualisation, broader architecture support (ARM, RISC‑V), AI‑driven automated optimisation suggestions, and deeper cloud‑native integration for micro‑services and serverless workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
