Introduction to Linux Performance Profiling Tools: Perf, gprof, and Valgrind
This article introduces three popular Linux performance profiling tools—Perf, gprof, and Valgrind—explaining their installation, basic command‑line usage, and how to visualize results with flame graphs, call graphs, or KCachegrind, while comparing their intrusiveness, startup methods, and output formats.
In software development, long startup times and high CPU usage are common performance problems. This article introduces three widely used Linux profiling tools—Perf, gprof, and Valgrind—along with basic usage instructions and graphical visualization methods.
The three tools are presented in three sections: a brief introduction, usage instructions, and methods for visualizing the results.
Sample Code for Demonstration
#include <unistd.h>using namespace std;#define NUM 500000void init(int* int_array){ for(int i=0;i<NUM;i++){ int_array[i]=i; }} void accu(int* int_array,long& sum){ for(int i=0;i<NUM;i++){ sum+=int_array[i]; usleep(3); }} int main(){int int_array[NUM];init(int_array);long sum=0;accu(int_array,sum);}
The sample program runs for about 31 seconds on a typical PC, with a maximum CPU usage of 8.3%.
1. Perf
Perf is a profiling tool built into the Linux kernel source tree. It uses event sampling to locate performance bottlenecks and hot code paths.
Usage
Perf can be started in two ways:
Directly launch the target program with perf record -e cpu-clock -g ./run (no root required).
Attach to an already running process with perf record -e cpu-clock -g -p 4522 (requires root).
Interrupt the profiling with Ctrl+C or let the program finish; a perf.data file is generated. To view the report, run:
perf report
The report can be visualized as a flame graph using Brendan Gregg’s FlameGraph scripts:
1. perf script -i perf.data > perf.unfold
2. /data/stackcollapse-perf.pl perf.unfold > perf.folded
3. /data/flamegraph.pl perf.folded > perf.svg
The resulting SVG shows the hierarchical call stack and time spent in each function.
2. Gprof
Gprof records the execution time and call count of each function. After a normal program exit, it generates a gmon.out file that can be turned into a readable report.
To enable gprof, compile the program with the -pg flag. After the program finishes, run:
gprof -b run gmon.out >> report.txt
The report can be visualized using gprof2dot.py and Graphviz:
python gprof2dot.py report.txt > report.dot
Open the resulting .dot file with Graphviz (e.g., gvedit.ext ) to see a call‑graph.
3. Valgrind (Callgrind)
Valgrind is not a native Linux tool and must be installed separately. It bundles several utilities; this article focuses on Callgrind for performance analysis.
Install Valgrind (e.g., from http://valgrind.org/downloads/valgrind-3.12.0.tar.bz2 ) and compile it with ./configure && make && make install .
Run the target program under Callgrind:
valgrind --tool=callgrind --separate-threads=yes ./run
The --separate-threads=yes option creates separate profiling files per thread. After execution, a file like callgrind.out.4263-01 is produced. Visualize it with kcachegrind.exe on Windows.
4. Tool Comparison
All three tools can locate the functions that consume the most execution time and CPU. However, they differ in several aspects:
4.1 Startup Method
Perf and Valgrind can be launched without modifying the program, but Perf may require root when attaching to a running process. Valgrind’s overhead reduces the maximum concurrent users during load testing.
4.2 Intrusiveness
Perf and Valgrind are non‑intrusive; gprof requires recompilation with -pg and may need code changes to allow graceful termination for long‑running services.
4.3 Result Presentation
Gprof produces a hierarchical “inverse tree” of call times, Perf generates a pyramid‑style flame graph, and Valgrind (Callgrind) shows a single call path with time annotations.
4.4 Monitoring Principles
The underlying monitoring mechanisms differ (sampling vs. instrumentation), but a detailed discussion is beyond the scope of this article.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.