Mastering Linux Performance: From 5W2H Methodology to Flame Graphs
This article introduces a systematic approach to diagnosing Linux performance issues, covering the 5W2H analysis framework, essential CPU, memory, disk I/O and network monitoring tools, practical command examples, flame‑graph generation, and a real‑world nginx case study with actionable insights.
When encountering obscure problems that monitoring plugins cannot instantly pinpoint, deeper server‑side analysis is required; this article presents a comprehensive methodology and toolset for locating performance bottlenecks in Linux systems.
2. Explanation
This article mainly introduces various problem‑location tools and combines case studies for analysis.
3. Problem‑analysis methodology
Applying the 5W2H method yields the following questions:
What – what is the phenomenon?
When – when does it occur?
Why – why does it happen?
Where – where does it happen?
How much – how many resources are consumed?
How to do – how to solve the problem?
4. CPU
4.1 Explanation
For applications we usually focus on kernel CPU scheduler functionality and performance. Thread‑state analysis examines where thread time is spent, with states such as on‑CPU (user and sys) and off‑CPU (runnable, anonymous page, sleep, lock, idle, etc.). Understanding concepts like processor, core, hardware thread, caches, clock frequency, CPI, IPC, usage, scheduler, run queue, preemption, multi‑process/thread, and word length is essential.
4.2 Analysis tools
uptime, vmstat, mpstat, top, pidstat – show CPU and load usage.
perf – can trace function‑level time consumption and specify kernel functions.
4.3 Usage
// View system CPU usage
top
// View per‑CPU information
mpstat -P ALL 1
// View CPU usage and average load
vmstat 1
// Process‑level CPU statistics
pidstat -u 1 -p pid
// Trace function‑level CPU usage in a process
perf top -p pid -e cpu-clock5. Memory
5.1 Explanation
Memory issues affect not only performance but also service availability. Key concepts include main memory, virtual memory, resident memory, address space, OOM, page cache, page faults, swapping, and allocators such as libc, glibc, libmalloc, mtmalloc, and the kernel SLUB allocator.
5.2 Analysis tools
free, vmstat, top, pidstat, pmap – show memory usage.
valgrind – detects memory leaks.
dtrace – dynamic tracing of kernel functions via D scripts.
5.3 Usage
// View system memory usage
free -m
// Virtual memory statistics
vmstat 1
// View system memory status
top
// Process memory statistics
pidstat -p pid -r 1
// View process memory map
pmap -d pid
// Detect memory leaks
valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./program6. Disk I/O
6.1 Explanation
Disk is the slowest subsystem and a common performance bottleneck; understanding file systems, VFS, caches, inode, and I/O scheduling is necessary for monitoring.
6.2 Analysis tools
6.3 Usage
// View system I/O information
iotop
// Detailed I/O statistics
iostat -d -x -k 1 10
// Process‑level I/O information
pidstat -d 1 -p pid
// Trace I/O requests
perf record -e block:block_rq_issue -ag
perf report7. Network
7.1 Explanation
Network monitoring is complex due to latency, blocking, collisions, packet loss, and interactions with routers, switches, and wireless signals; modern NICs adapt automatically to varying conditions.
7.2 Analysis tools
7.3 Usage
// Show network statistics
netstat -s
// Show current UDP connections
netstat -nu
// Show UDP port usage
netstat -apu
// Count connections per state
netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
// Show TCP connections
ss -t -a
// Show socket summary
ss -s
// Show all UDP sockets
ss -u -a
// Show TCP/ETCP stats
sar -n TCP,ETCP 1
// Show network I/O
sar -n DEV 1
// Capture packets by host and port
tcpdump -i eth1 host 192.168.1.1 and port 80
// Capture and display packet streams
tcpflow -cp host 192.168.1.18. System Load
8.1 Explanation
Load measures the amount of work a computer is doing; Load Average is the average over 1, 5, and 15 minutes, representing the length of the process queue.
8.2 Analysis tools
8.3 Usage
// View load information
uptime
top
vmstat
// Trace system call latency
strace -c -p pid
// Trace specific syscalls (e.g., epoll_wait)
strace -T -e epoll_wait -p pid
// View kernel logs
dmesg9. Flame Graphs
9.1 Explanation
Flame Graphs, created by Brendan Gregg, visualize CPU call stacks; the y‑axis shows stack depth, the x‑axis shows sample counts. Wider bars indicate functions that consume more CPU time. Different types include on‑CPU, off‑CPU, memory, hot/cold, and differential graphs.
9.2 Installing dependencies
// Install systemtap (if not already installed)
yum install systemtap systemtap-runtime
// Install kernel debug packages matching the kernel version
# Example for kernel 2.6.18-308.el5
kernel-debuginfo-2.6.18-308.el5.x86_64.rpm
kernel-devel-2.6.18-308.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-308.el5.x86_64.rpm
// Install kernel debug info via yum
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc9.3 Installation
git clone https://github.com/lidaohang/quick_location.git
cd quick_location9.4 CPU‑level flame graphs
When CPU usage is high or cannot increase, flame graphs quickly pinpoint the problematic functions.
9.4.1 on‑CPU
CPU time is split into user and system modes.
Usage:
// on‑CPU user
sh ngx_on_cpu_u.sh pid
cd ngx_on_cpu_u
// on‑CPU kernel
sh ngx_on_cpu_k.sh pid
cd ngx_on_cpu_k
# Serve the generated SVG
python -m SimpleHTTPServer 8088
# Then open http://127.0.0.1:8088/pid.svg9.4.2 off‑CPU
Off‑CPU time includes waiting for CPU, I/O, locks, etc.
Usage:
// off‑CPU user
sh ngx_off_cpu_u.sh pid
cd ngx_off_cpu_u
// off‑CPU kernel
sh ngx_off_cpu_k.sh pid
cd ngx_off_cpu_k
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/pid.svg9.5 Memory‑level flame graphs
Memory flame graphs help locate memory leaks or excessive allocations.
Usage:
sh ngx_on_memory.sh pid
cd ngx_on_memory
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/pid.svg9.6 Performance regression – red/blue differential flame graphs
By capturing two profiles (before and after a change) and generating a differential flame graph, red areas indicate increased cost, blue areas indicate decreased cost.
// Capture baseline profile
perf record -F 99 -p pid -g -- sleep 30
perf script > out.stacks1
// Capture post‑change profile
perf record -F 99 -p pid -g -- sleep 30
perf script > out.stacks2
// Generate folded stacks
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
// Generate differential flame graph
./FlameGraph/difffolded.pl out.folded1 out.folded2 | ./FlameGraph/flamegraph.pl > diff2.svg10. Case Study – Nginx Cluster Anomaly
10.1 Observation
On 2017‑09‑25 19:00, the Nginx cluster showed a surge of 499 and 5xx responses and increased CPU usage.
10.2 Nginx metrics analysis
a) Request traffic
Conclusion: Traffic did not spike; it actually decreased, so the issue is not traffic‑related.
b) Response time
Conclusion: Response time increased, possibly due to Nginx itself or upstream latency.
c) Upstream response time
Conclusion: Upstream response time increased, likely dragging Nginx performance.
10.3 System CPU analysis
a) Top output
Conclusion: Nginx worker CPU usage is high.
b) Perf top on Nginx process
Command: perf top -p pid Conclusion: Main overhead comes from free, malloc, and JSON parsing.
10.4 Flame‑graph CPU analysis
Generated user‑mode CPU flame graph shows heavy JSON parsing and memory allocation.
10.5 Summary
a) Traffic analysis revealed upstream latency as the root cause of request anomalies.
b) CPU profiling identified costly JSON parsing and memory allocation inside Nginx modules.
Solution: Disable the high‑CPU module, observe reduced CPU usage and normalized traffic; upstream latency was caused by a loopback call to Nginx.
11. References
Brendan Gregg – Performance Analysis
CPU Flame Graphs
Memory Flame Graphs
Off‑CPU Flame Graphs
Differential Flame Graphs
OpenResty SystemTap Toolkit
FlameGraph Repository
Blazing Performance with Flame Graphs
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
