Operations 10 min read

Introduction to Linux Performance Profiling Tools: Perf, gprof, and Valgrind

This article introduces three popular Linux performance profiling tools—Perf, gprof, and Valgrind—explaining their installation, basic command‑line usage, and how to visualize results with flame graphs, call graphs, or KCachegrind, while comparing their intrusiveness, startup methods, and output formats.

Tencent Cloud Developer

Mar 16, 2018

Introduction to Linux Performance Profiling Tools: Perf, gprof, and Valgrind

In software development, long startup times and high CPU usage are common performance problems. This article introduces three widely used Linux profiling tools—Perf, gprof, and Valgrind—along with basic usage instructions and graphical visualization methods.

The three tools are presented in three sections: a brief introduction, usage instructions, and methods for visualizing the results.

Sample Code for Demonstration

#include <unistd.h>using namespace std;#define NUM 500000void init(int* int_array){	for(int i=0;i<NUM;i++){		int_array[i]=i;	}}
void accu(int* int_array,long& sum){	for(int i=0;i<NUM;i++){		sum+=int_array[i];		usleep(3);	}}
int main(){int int_array[NUM];init(int_array);long sum=0;accu(int_array,sum);}

The sample program runs for about 31 seconds on a typical PC, with a maximum CPU usage of 8.3%.

1. Perf

Perf is a profiling tool built into the Linux kernel source tree. It uses event sampling to locate performance bottlenecks and hot code paths.

Usage

Perf can be started in two ways:

Directly launch the target program with perf record -e cpu-clock -g ./run (no root required).

Attach to an already running process with perf record -e cpu-clock -g -p 4522 (requires root).

Interrupt the profiling with Ctrl+C or let the program finish; a perf.data file is generated. To view the report, run: perf report The report can be visualized as a flame graph using Brendan Gregg’s FlameGraph scripts:

1. perf script -i perf.data > perf.unfold 2. /data/stackcollapse-perf.pl perf.unfold > perf.folded 3. /data/flamegraph.pl perf.folded > perf.svg The resulting SVG shows the hierarchical call stack and time spent in each function.

2. Gprof

Gprof records the execution time and call count of each function. After a normal program exit, it generates a gmon.out file that can be turned into a readable report.

To enable gprof, compile the program with the -pg flag. After the program finishes, run: gprof -b run gmon.out >> report.txt The report can be visualized using gprof2dot.py and Graphviz: python gprof2dot.py report.txt > report.dot Open the resulting .dot file with Graphviz (e.g., gvedit.ext) to see a call‑graph.

3. Valgrind (Callgrind)

Valgrind is not a native Linux tool and must be installed separately. It bundles several utilities; this article focuses on Callgrind for performance analysis.

Install Valgrind (e.g., from http://valgrind.org/downloads/valgrind-3.12.0.tar.bz2) and compile it with ./configure && make && make install.

Run the target program under Callgrind: valgrind --tool=callgrind --separate-threads=yes ./run The --separate-threads=yes option creates separate profiling files per thread. After execution, a file like callgrind.out.4263-01 is produced. Visualize it with kcachegrind.exe on Windows.

4. Tool Comparison

All three tools can locate the functions that consume the most execution time and CPU. However, they differ in several aspects:

4.1 Startup Method

Perf and Valgrind can be launched without modifying the program, but Perf may require root when attaching to a running process. Valgrind’s overhead reduces the maximum concurrent users during load testing.

4.2 Intrusiveness

Perf and Valgrind are non‑intrusive; gprof requires recompilation with -pg and may need code changes to allow graceful termination for long‑running services.

4.3 Result Presentation

Gprof produces a hierarchical “inverse tree” of call times, Perf generates a pyramid‑style flame graph, and Valgrind (Callgrind) shows a single call path with time annotations.

4.4 Monitoring Principles

The underlying monitoring mechanisms differ (sampling vs. instrumentation), but a detailed discussion is beyond the scope of this article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance profiling Linux flame graph callgrind gprof perf valgrind

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.