Operations 28 min read

Master Linux Performance: Using perf for Profiling and Optimization

Linux perf is a powerful, flexible profiling tool that lets developers and system administrators monitor hardware and software events, analyze CPU, memory, and I/O performance, generate flame graphs, and troubleshoot bottlenecks across single processes, containers, and multi‑core systems, with extensive commands and advanced techniques.

MaGe Linux Operations

Jul 2, 2023

Master Linux Performance: Using perf for Profiling and Optimization

Introduction

Brief Introduction to Linux perf

Linux perf (performance analysis tool) is a powerful and flexible utility integrated into the Linux kernel that can detect and debug a wide range of performance problems. It can probe kernel performance events, hardware counters, and user‑space application events.

perf can profile applications to find bottlenecks and optimization points, improving system performance and stability. It supports many statistical and view modes, providing deep performance insight for developers and administrators.

Why Understanding perf Is Crucial for Linux Users

perf is essential because it enables performance optimization, system monitoring, problem localization, and deeper understanding of Linux internals. It helps developers locate hot spots, administrators monitor resources in real time, and both groups gain insight into kernel behavior.

Overview of the perf Tool

Origin and Development

perf originated from the growing need for performance analysis among kernel developers. In 2009 Ingo Molnar introduced perf, merging it into kernel 2.6.31. Since then it has been continuously improved, gaining support from the kernel community and hardware vendors.

Core Components of perf

perf consists of several core components:

perf events – the basic measurement units representing hardware, kernel, or user‑space events.

perf counters – devices that record the number of occurrences of events.

perf command‑line interface – the main way users interact with perf (sub‑commands such as stat, record, report, etc.).

perf data storage – files that hold collected performance data for later analysis.

perf analyzer – generates detailed reports that reveal performance bottlenecks.

Basic Commands and Usage

perf list – view available events $ perf list Common options: -F or --fields: specify output fields (e.g., -F event,desc). -H or --show-hierarchy: display events hierarchically. --help: show help. --filter: filter events by a string.

perf stat – view performance statistics $ perf stat [options] [command] Common options include: -e or --event: specify events (e.g., cache-misses). -p or --pid: monitor a specific PID. -t or --tid: monitor a specific TID. -a or --all-cpus: monitor all CPUs. -C or --cpu: select CPU list. -c or --count: set sampling period. -r or --repeat: repeat command. -d or --detailed: show detailed stats. -D or --delay: delay between outputs. -n or --null: run without collecting data. -o or --output: write data to a file. -A or --no-aggr: disable aggregation. --metric-only, --metricgroup, --metrics: metric‑related options. --per-socket, --per-core, --per-thread: aggregation scopes. --no-merge: do not merge PMU results.

perf record – record performance data $ perf record [options] [command] Data is saved to perf.data by default.

Common options: -e or --event: events to record. -p or --pid: target PID. -t or --tid: target TID. -a or --all-cpus: all CPUs. -C or --cpu: CPU list. -f or --overwrite: overwrite existing data. -c or --count: sampling period. -r or --real-time: set real‑time priority. -o or --output: output file. -g or --call-graph: record call‑graph (e.g., dwarf or fp).

Other options for context‑switch events, buffering, dry‑run, etc.

perf report – generate performance reports $ perf report [options] Reads perf.data and presents analysis in various formats. Useful options: -i or --input: input file (default perf.data). -F or --fields: fields to display. --sort: sort order. -T or --threads: show thread data. -m or --modules: show module data. -k or --vmlinux: path to kernel symbols. -f or --force: force parsing. -c or --comms: filter by command name. --dsos, -s or --symbols: filter DSOs or symbols. --percent-limit: hide low‑percentage entries. -P or --pretty: output format (raw, normal, etc.). --stdio or --tui or --gtk: output mode. -g or --call-graph: show call‑graph. --no-children, --no-demangle, --demangle, --filter, --max-stack: additional controls.

perf annotate – source‑level analysis $ perf annotate [options] [symbol] Shows instruction‑level performance data for each function, helping locate hot spots.

perf top – real‑time hot‑function view $ perf top [options] Continuously displays functions consuming the most CPU, aiding quick bottleneck identification.

perf bench – built‑in benchmarks $ perf bench [options] [subcommand] Provides benchmarks for memory, scheduling, file‑system, etc. Example: $ perf bench mem memcpy tests memory bandwidth.

Practical Applications and Cases

CPU Performance Analysis

$ perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses -- ./my_program

Collects CPU‑level metrics for a program.

Memory Performance Analysis $ perf mem record ./my_program && perf mem report Records and reports memory‑access events.

I/O Performance Analysis

$ perf trace -e block:block_rq_issue,block:block_rq_complete -- ./my_program

Traces block‑device requests to analyze I/O latency.

Software Performance Tuning $ perf record -g ./my_program && $ perf report Generates a call‑graph report highlighting hot functions.

System Bottleneck Localization $ perf top Real‑time view of CPU‑intensive functions.

Hardware Performance Evaluation $ perf bench mem memcpy Runs a memory copy benchmark to evaluate hardware.

Generating Flame Graphs

Steps:

Run perf to collect data: sudo perf record -F 99 -p <pid> -g -- sleep 30 Convert to script output: perf script > out.perf Clone FlameGraph repo: git clone https://github.com/brendangregg/FlameGraph Collapse stacks and generate SVG:

FlameGraph/stackcollapse-perf.pl out.perf > out.folded

FlameGraph/flamegraph.pl out.folded > out.svg

Open out.svg to view the flame graph; width reflects time proportion, color depth indicates call‑stack depth.

Advanced Techniques and Practices

Customizing Performance Events $ perf stat -e rNNN -- ./my_program Uses raw hardware event code NNN. Multiple events can be grouped, e.g.:

$ perf stat -e '{cycles,instructions},{cache-references,cache-misses}' -- ./my_program

Combining perf with Other Tools

$ perf record -g -- ./my_program | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg

Generates an interactive flame graph.

Multi‑Core Performance Analysis $ perf stat -C 0-3 -e cycles,instructions -- ./my_program Monitors cores 0‑3 only.

Long‑Term Monitoring $ perf record -a -F 100 -g -- sleep 86400 Records system‑wide data for a full day.

Performance in Virtualization and Containers

$ docker run --privileged -v /usr/bin/perf:/usr/bin/perf -it my_image /bin/bash

Runs perf inside a container with access to host counters.

Common Problems and Solutions

Installation Issues

On Debian‑based systems install with:

# Linux kernel tools
sudo apt-get install linux-tools-common linux-tools-$(uname -r)

# WSL (no real kernel)
sudo apt-get update
sudo apt-get install linux-tools-common linux-tools-generic

Data Collection Issues

Ensure the kernel supports the desired events; try different event sets or lower the sampling rate if data looks inaccurate.

Report Interpretation Issues

Compile programs with debugging symbols ( -g) and point perf to the correct symbol files. Learn the various output formats to customize reports.

Compatibility Issues

Check kernel and hardware support for perf. In virtualized or container environments, enable privileged access to performance counters. For non‑x86 architectures, consider building perf from source.

Summary and Outlook

perf is a versatile Linux performance analysis tool offering direct hardware counter access, support for many event types, comprehensive CPU/memory/I/O analysis, multi‑core and cloud‑native capabilities, and integration with other tools such as FlameGraph.

Future directions may include richer visualisation, broader architecture support (ARM, RISC‑V), AI‑driven automated optimisation suggestions, and deeper cloud‑native integration for micro‑services and serverless workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance profiling Linux flame graph System Monitoring perf

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.