Master Linux Performance: From 5W2H Methodology to Flame Graphs
This comprehensive guide explains how to diagnose Linux performance issues using a structured 5W2H approach, introduces essential monitoring tools for CPU, memory, disk I/O, and network, and demonstrates practical flame‑graph techniques—including on‑CPU, off‑CPU, memory, and differential analyses—to quickly locate and resolve bottlenecks.
1. Background
Sometimes we encounter difficult problems that monitoring plugins cannot immediately pinpoint. In such cases we need to log into the server for deeper analysis. Analyzing problems requires experience and broad knowledge, and good tools can greatly speed up locating issues.
2. Description
This article introduces various problem‑location tools and combines case studies for analysis.
3. Problem‑analysis methodology
Applying the 5W2H method yields several performance‑analysis questions:
What – what is the phenomenon?
When – when does it happen?
Why – why does it happen?
Where – where does it happen?
How much – how many resources are consumed?
How to do – how to solve it?
4. CPU
4.1 Description
For applications we usually focus on the kernel CPU scheduler functionality and performance.
Thread‑state analysis examines where thread time is spent. States include:
on‑CPU: running time, split into user‑time and system‑time.
off‑CPU: waiting for the next CPU slice, I/O, locks, paging, etc., with sub‑states such as runnable, anonymous‑page, sleep, lock, idle.
If a large portion of time is on‑CPU, CPU profiling can quickly explain the cause; if most time is off‑CPU, locating the problem is more time‑consuming. Key concepts include processor, core, hardware thread, CPU cache, clock frequency, CPI/IPC, instruction set, utilization, user/kernel time, scheduler, run queue, preemption, multi‑process, multi‑thread, word length.
4.2 Analysis tools
uptime, vmstat, mpstat, top, pidstat – show CPU and load usage.
perf – tracks per‑function CPU time and can target kernel functions.
4.3 Usage
// View system CPU usage
top
// View per‑CPU core info
mpstat -P ALL 1
// View CPU usage and average load
vmstat 1
// Process CPU statistics
pidstat -u 1 -p pid
// Trace per‑function CPU usage of a process
perf top -p pid -e cpu-clock5. Memory
5.1 Description
Memory is designed for efficiency, but memory problems can affect service availability. Important concepts include main memory, virtual memory, resident memory, address space, OOM, page cache, page fault, swapping, user allocators (libc, glibc, libmalloc, mtmalloc), and the kernel SLUB allocator.
5.2 Analysis tools
free, vmstat, top, pidstat, pmap – report memory usage.
valgrind – detects memory leaks.
dtrace – dynamic tracing of kernel functions via D scripts.
5.3 Usage
// View system memory usage
free -m
// Virtual memory statistics
vmstat 1
// View system memory status
top
// Per‑process memory statistics (1 s interval)
pidstat -p pid -r 1
// View process memory map
pmap -d pid
// Detect memory issues
valgrind --tool=memcheck --leak-check=full --log-file=./log.txt ./program6. Disk I/O
6.1 Description
Disk is the slowest subsystem and a common performance bottleneck because it is far from the CPU and involves mechanical operations. Understanding basic concepts such as file system, VFS, page cache, buffer cache, inode, and I/O scheduler is essential for monitoring I/O performance.
6.2 Analysis tools
6.3 Usage
// View system I/O
iotop
// Detailed I/O statistics
iostat -d -x -k 1 10
// Per‑process I/O
pidstat -d 1 -p pid
// Record I/O requests
perf record -e block:block_rq_issue -ag
perf report7. Network
7.1 Description
Network monitoring is the most complex Linux subsystem due to factors like latency, blocking, collisions, and packet loss, as well as external devices such as routers and switches that affect overall performance.
7.2 Analysis tools
7.3 Usage
// Show network statistics
netstat -s
// Show current UDP connections
netstat -nu
// Show UDP port usage
netstat -apu
// Count connections per state
netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
// Show TCP connections
ss -t -a
// Show socket summary
ss -s
// Show all UDP sockets
ss -u -a
// TCP/ETCP stats
sar -n TCP,ETCP 1
// Network I/O
sar -n DEV 1
// Packet capture by host and port
tcpdump -i eth1 host 192.168.1.1 and port 80
// Packet capture by flow
tcpflow -cp host 192.168.1.18. System Load
8.1 Description
Load measures how much work a computer is doing; Load Average is the average over 1, 5, and 15 minutes.
8.2 Analysis tools
8.3 Usage
// View load
uptime
top
vmstat
// Trace system call latency
strace -c -p pid
// Trace specific syscalls (e.g., epoll_wait)
strace -T -e epoll_wait -p pid
// View kernel logs
dmesg9. Flame Graphs
9.1 Description
Flame Graphs, created by Brendan Gregg, visualize CPU call stacks. The Y‑axis represents stack depth, the X‑axis represents sample count (not time). Wider blocks indicate functions that consume more CPU.
9.2 Installing dependencies
// Install systemtap
yum install systemtap systemtap-runtime
// Install kernel debug packages matching the running kernel
kernel-debuginfo-$(uname -r).rpm
kernel-devel-$(uname -r).rpm
kernel-debuginfo-common-$(uname -r).rpm
// Install kernel debug info
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc9.3 Installation
git clone https://github.com/lidaohang/quick_location.git
cd quick_location9.4 CPU‑level flame graphs
9.4.1 On‑CPU
High CPU usage can be pinpointed to specific functions using flame graphs.
// on‑CPU user
sh ngx_on_cpu_u.sh pid
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/pid.svgExample C program used for demonstration:
#include <stdio.h>
#include <stdlib.h>
void foo3() {}
void foo2() {
int i;
for(i=0; i<10; i++) foo3();
}
void foo1() {
int i;
for(i=0; i<1000; i++) foo3();
}
int main(void) {
int i;
for(i=0; i<1000000000; i++) {
foo1();
foo2();
}
}9.4.2 Off‑CPU
Off‑CPU flame graphs show where threads spend time waiting.
// off‑CPU user
sh ngx_off_cpu_u.sh pid
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/pid.svg9.5 Memory‑level flame graphs
Memory‑level flame graphs help locate memory leaks or excessive allocations.
sh ngx_on_memory.sh pid
cd ngx_on_memory
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/pid.svg9.6 Differential (red‑blue) flame graphs
Differential flame graphs compare two profiles to highlight performance regressions (red) or improvements (blue).
cd quick_location
# Record baseline profile
perf record -F 99 -p pid -g -- sleep 30
perf script > out.stacks1
# Record changed profile
perf record -F 99 -p pid -g -- sleep 30
perf script > out.stacks2
# Generate diff flame graph
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | ./FlameGraph/flamegraph.pl > diff2.svg10. Case Study: Nginx Cluster Anomaly
10.1 Symptom
On 2017‑09‑25 at 19:00 the Nginx cluster showed many 499 and 5xx responses and a rising CPU usage.
10.2 Nginx metrics analysis
Request traffic did not spike; it actually decreased, indicating the issue is not traffic‑related.
Response time increased, possibly due to Nginx itself or upstream latency.
Upstream response time also grew, suggesting backend services may be slowing Nginx.
10.3 System CPU analysis
Top shows high CPU usage by Nginx workers.
perf top reveals most overhead in free, malloc, and JSON parsing.
10.4 Flame‑graph CPU analysis
User‑mode flame graph identifies JSON parsing as a hot spot.
10.5 Summary
Two root causes were identified: (a) upstream latency causing request delays, and (b) expensive JSON parsing and memory allocation inside Nginx. Disabling the high‑CPU module reduced CPU usage and normalized traffic.
11. References
http://www.brendangregg.com/index.html
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html
http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
https://github.com/openresty/openresty-systemtap-toolkit
https://github.com/brendangregg/FlameGraph
https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
