Linux Performance Mastery: Tools, 5W2H Methodology & Flame Graph Case Study
This comprehensive guide explains how to diagnose Linux system performance issues using a structured 5W2H approach, covering CPU, memory, disk I/O, network, load, and flame‑graph analysis, with practical command examples and a real‑world nginx case study.
1. Background
When complex problems arise that monitoring plugins cannot instantly pinpoint, deeper server analysis is required. Effective analysis demands experience and broad knowledge, and having the right tools can dramatically speed up root‑cause identification.
2. Overview
This article introduces various problem‑location tools and demonstrates their use with case studies.
3. Problem‑analysis Methodology
Applying the 5W2H method helps formulate performance‑analysis questions:
What – what is the phenomenon?
When – when does it occur?
Why – why does it happen?
Where – where does it happen?
How much – how many resources are consumed?
How to do – how to solve it?
4. CPU
4.1 Overview
For applications we focus on kernel CPU scheduler functionality and performance. Thread‑state analysis distinguishes on‑CPU (user and sys time) and off‑CPU (waiting, I/O, locks, paging, etc.). Understanding concepts such as processor, core, hardware thread, cache, CPI, IPC, scheduler, run‑queue, preemption, multi‑process/thread, and word size is essential.
4.2 Analysis Tools
uptime, vmstat, mpstat, top, pidstat – basic CPU/load metrics.
perf – detailed per‑function CPU usage, can target kernel functions.
4.3 Usage
<code>// View overall CPU usage
top
// Show per‑CPU statistics
mpstat -P ALL 1
// Show CPU usage and load average
vmstat 1
// Per‑process CPU stats
pidstat -u 1 -p <pid>
// Trace function‑level CPU usage for a process
perf top -p <pid> -e cpu-clock</code>5. Memory
5.1 Overview
Memory issues affect not only performance but also service availability. Key concepts include main memory, virtual memory, resident set, address space, OOM, page cache, page faults, swapping, and allocators (libc, glibc, jemalloc, SLUB).
5.2 Analysis Tools
free, vmstat, top, pidstat, pmap – memory usage statistics.
valgrind – memory leak detection.
dtrace – dynamic tracing of kernel functions (requires D language scripts).
5.3 Usage
<code>// Show system memory usage
free -m
// Show virtual memory stats
vmstat 1
// Show memory usage via top
top
// Per‑process memory stats (1‑second interval)
pidstat -p <pid> -r 1
// Show process memory map
pmap -d <pid>
// Detect memory leaks with valgrind
valgrind --tool=memcheck --leak-check=full --log-file=./log.txt ./program</code>6. Disk I/O
6.1 Overview
Disk is the slowest subsystem and a common performance bottleneck. Understanding filesystem, VFS, page cache, buffer cache, inode, and I/O schedulers (e.g., noop) is necessary before monitoring.
6.2 Analysis Tools
6.3 Usage
<code>// View I/O activity
iotop
// Detailed I/O stats
iostat -d -x -k 1 10
// Per‑process I/O stats
pidstat -d 1 -p <pid>
// Investigate abnormal I/O with perf
perf record -e block:block_rq_issue -ag
^C
perf report</code>7. Network
7.1 Overview
Network monitoring is complex due to latency, loss, congestion, and external devices (routers, switches, wireless). Modern NICs are adaptive, adjusting to link conditions.
7.2 Analysis Tools
7.3 Usage
<code>// Network statistics
netstat -s
// UDP connections
netstat -nu
// UDP port usage
netstat -apu
// Count connections per state
netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
// Show TCP sockets
ss -t -a
// Summary of sockets
ss -s
// Show UDP sockets
ss -u -a
// TCP/ETCP stats
sar -n TCP,ETCP 1
// Network I/O stats
sar -n DEV 1
// Capture packets (host & port filter)
tcpdump -i eth1 host 192.168.1.1 and port 80
// Capture flows
tcpflow -cp host 192.168.1.1</code>8. System Load
8.1 Overview
Load measures the amount of work the system is doing; Load Average is the average over 1, 5, and 15 minutes, representing the length of the run‑queue.
8.2 Analysis Tools
8.3 Usage
<code>// View load
uptime
top
vmstat
// Trace system call latency
strace -c -p <pid>
// Trace specific syscalls (e.g., epoll_wait)
strace -T -e epoll_wait -p <pid>
// Kernel logs
dmesg</code>9. Flame Graphs
9.1 Overview
Flame Graphs, created by Brendan Gregg, visualize CPU call stacks. The Y‑axis shows stack depth, the X‑axis shows sample counts. Wider blocks indicate functions that consume more CPU time.
9.2 Install Dependencies
<code>// Install systemtap (usually pre‑installed)
yum install systemtap systemtap-runtime
// Install matching kernel debuginfo packages
kernel-debuginfo-$(uname -r).rpm
kernel-devel-$(uname -r).rpm
kernel-debuginfo-common-$(uname -r).rpm
// Install kernel debuginfo via repo
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc</code>9.3 Clone Tools
<code>git clone https://github.com/lidaohang/quick_location.git
cd quick_location</code>9.4 On‑CPU Flame Graph
High CPU usage can be pinpointed to specific functions using on‑CPU flame graphs.
9.4.1 on‑CPU
<code>// on‑CPU user mode
sh ngx_on_cpu_u.sh <pid>
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# Open 127.0.0.1:8088/pid.svg in a browser</code>Demo C program used for generating the graph:
<code>#include <stdio.h>
#include <stdlib.h>
void foo3() {}
void foo2() {
int i;
for(i=0 ; i < 10; i++)
foo3();
}
void foo1() {
int i;
for(i = 0; i< 1000; i++)
foo3();
}
int main(void) {
int i;
for(i =0; i< 1000000000; i++) {
foo1();
foo2();
}
}
</code>9.4.2 off‑CPU
Off‑CPU graphs reveal time spent waiting (I/O, locks, paging, etc.).
<code>// off‑CPU user mode
sh ngx_off_cpu_u.sh <pid>
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# Open 127.0.0.1:8088/pid.svg</code>9.5 Memory Flame Graph
Useful for locating memory leaks.
<code>sh ngx_on_memory.sh <pid>
cd ngx_on_memory
python -m SimpleHTTPServer 8088
# Open 127.0.0.1:8088/pid.svg</code>9.6 Differential (Red‑Blue) Flame Graphs
Compare two profiles to spot performance regressions; red indicates increase, blue indicates decrease.
<code>// Capture baseline profile
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks1
// Capture new profile
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks2
// Generate diff flame graph
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | ./FlameGraph/flamegraph.pl > diff2.svg
</code>Demo C programs (original vs. modified) illustrate the diff graph.
10. Case Study: Nginx Cluster Anomaly
10.1 Symptoms
On 2017‑09‑25 at 19:00, monitoring showed a surge of 499 and 5xx responses from the Nginx cluster, accompanied by rising CPU usage.
10.2 Nginx Metrics Analysis
Traffic graphs indicated no spike; traffic actually decreased, ruling out a traffic surge.
Response‑time graphs showed increased latency, possibly due to Nginx itself or upstream services.
Upstream response‑time graphs confirmed that backend latency contributed to the issue.
10.3 System CPU Analysis
Using
toprevealed high CPU usage on Nginx workers.
Running
perf top -p <pid>showed most overhead in memory allocation, free, and JSON parsing.
10.4 Flame‑Graph CPU Analysis
On‑CPU user flame graph highlighted heavy JSON parsing as the dominant consumer.
10.5 Summary
Two root causes were identified: upstream latency affecting request flow, and inefficient JSON parsing within Nginx causing high CPU usage. The latter was mitigated by disabling the problematic module, which lowered CPU load and restored normal request rates.
11. References
http://www.brendangregg.com/index.html
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html
http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
https://github.com/openresty/openresty-systemtap-toolkit
https://github.com/brendangregg/FlameGraph
https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs
Author: Lucien_168 – Source: 简书 (https://www.jianshu.com/p/0bbac570fa4c)
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.