Master Linux Performance Debugging with Flame Graphs and System Tools
This guide explains how to use Linux performance analysis tools—including top, vmstat, perf, flame graphs, and differential flame graphs—to diagnose CPU, memory, I/O, and network bottlenecks, with step‑by‑step commands, methodology, and real‑world case studies.
Problem Background
When routine monitoring cannot quickly reveal the root cause of a Linux system issue, deeper on‑server analysis is required. A systematic methodology reduces investigation time.
Methodology – 5W2H Framework
What – describe the observed symptom.
When – note the timing.
Why – hypothesize causes.
Where – locate the affected component.
How much – quantify resource consumption.
How to do – define remediation steps.
CPU Analysis
Key concepts: on‑CPU vs off‑CPU, processor architecture, CPI/IPC, scheduler, run‑queue, context switching. top, mpstat -P ALL 1, vmstat 1, pidstat -u 1 -p <pid> – overall CPU usage. perf top -p <pid> -e cpu-clock – per‑function profiling.
Memory Analysis
Key concepts: physical memory, virtual memory, resident set, page cache, OOM, swapping, allocators (glibc, jemalloc, SLUB). free -m, vmstat 1, top, pidstat -r 1 -p <pid>, pmap -d <pid> – memory overview.
valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./program– leak detection. dtrace scripts – kernel‑level tracing (requires kernel knowledge).
Disk I/O Analysis
Core concepts: filesystem, VFS, page cache, buffer cache, inode cache, I/O scheduler. iotop – real‑time I/O usage. iostat -d -x -k 1 10 – detailed I/O statistics. pidstat -d 1 -p <pid> – per‑process I/O. perf record -e block:block_rq_issue -a → perf report – kernel‑level I/O tracing.
Network Analysis
Key metrics: latency, packet loss, socket states. netstat -s, netstat -nu, netstat -apu – socket statistics. ss -t -a, ss -s, ss -u -a – modern socket inspection. sar -n TCP,ETCP 1, sar -n DEV 1 – throughput and errors. tcpdump -i eth1 host 192.168.1.1 and port 80, tcpflow -cp host 192.168.1.1 – packet capture.
System Load
Load average reflects the length of the runnable queue over 1, 5, and 15‑minute intervals. uptime, top, vmstat – quick load view. strace -c -p <pid> – system‑call cost breakdown. strace -T -e epoll_wait -p <pid> – latency of specific syscalls. dmesg – kernel log for warnings.
Flame Graphs
Flame graphs visualize sampled stack traces. The y‑axis shows call‑stack depth; the x‑axis shows sample frequency (wider bars = more time). Types include on‑CPU, off‑CPU, memory, and differential graphs.
Installation
# Install systemtap and runtime
yum install systemtap systemtap-runtime
# Install matching kernel debuginfo packages (example for RHEL 5)
kernel-debuginfo-2.6.18-308.el5.x86_64.rpm
kernel-devel-2.6.18-308.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-308.el5.x86_64.rpm
# Enable debuginfo repos
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibcClone the helper repository:
git clone https://github.com/lidaohang/quick_location.git
cd quick_locationGenerating On‑CPU Flame Graphs
# User‑space profile
sh ngx_on_cpu_u.sh <pid>
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/<pid>.svg
# Kernel‑space profile
sh ngx_on_cpu_k.sh <pid>
cd ngx_on_cpu_k
python -m SimpleHTTPServer 8088Generating Off‑CPU Flame Graphs
# User‑space off‑CPU profile
sh ngx_off_cpu_u.sh <pid>
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# Kernel‑space off‑CPU profile
sh ngx_off_cpu_k.sh <pid>
cd ngx_off_cpu_k
python -m SimpleHTTPServer 8088Differential Flame Graphs
# Record two profiles (before and after change)
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks1
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks2
# Collapse and diff
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | \
./FlameGraph/flamegraph.pl > diff2.svgRed bars indicate increased cost; blue bars indicate reduced cost.
Real‑World Case Study: Nginx Cluster Anomaly
Checked request volume – traffic was actually decreasing.
Examined Nginx response times – latency had risen, suggesting upstream impact.
Inspected upstream response times – also increased, pointing to a backend slowdown.
Used top and perf top -p <pid> – identified heavy CPU consumption in JSON parsing and memory allocation.
Generated on‑CPU flame graphs – confirmed the JSON library dominated CPU usage.
Correlated findings – concluded a recent code change introduced inefficient JSON handling, amplifying upstream latency.
Remediation – disabled the problematic module, observed CPU drop and restored normal request flow.
Summary
Combining the 5W2H questioning framework with a curated set of Linux observability tools (top, vmstat, mpstat, pidstat, perf, iostat, iotop, netstat, ss, sar, strace, dmesg) enables rapid root‑cause analysis across CPU, memory, I/O, network, and load dimensions. Flame graphs, especially differential ones, provide visual insight into performance regressions and are valuable for continuous integration pipelines and production incident response.
References
Brendan Gregg – Flame Graphs: https://github.com/brendangregg/FlameGraph
OpenResty SystemTap Toolkit: https://github.com/openresty/openresty-systemtap-toolkit
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
