Operations 18 min read

Master Linux Performance Debugging with Flame Graphs and System Tools

This guide explains how to use Linux performance analysis tools—including top, vmstat, perf, flame graphs, and differential flame graphs—to diagnose CPU, memory, I/O, and network bottlenecks, with step‑by‑step commands, methodology, and real‑world case studies.

Liangxu Linux

Feb 25, 2020

Master Linux Performance Debugging with Flame Graphs and System Tools

Problem Background

When routine monitoring cannot quickly reveal the root cause of a Linux system issue, deeper on‑server analysis is required. A systematic methodology reduces investigation time.

Methodology – 5W2H Framework

What – describe the observed symptom.

When – note the timing.

Why – hypothesize causes.

Where – locate the affected component.

How much – quantify resource consumption.

How to do – define remediation steps.

CPU Analysis

Key concepts: on‑CPU vs off‑CPU, processor architecture, CPI/IPC, scheduler, run‑queue, context switching. top, mpstat -P ALL 1, vmstat 1, pidstat -u 1 -p <pid> – overall CPU usage. perf top -p <pid> -e cpu-clock – per‑function profiling.

Memory Analysis

Key concepts: physical memory, virtual memory, resident set, page cache, OOM, swapping, allocators (glibc, jemalloc, SLUB). free -m, vmstat 1, top, pidstat -r 1 -p <pid>, pmap -d <pid> – memory overview.

valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./program

– leak detection. dtrace scripts – kernel‑level tracing (requires kernel knowledge).

Disk I/O Analysis

Core concepts: filesystem, VFS, page cache, buffer cache, inode cache, I/O scheduler. iotop – real‑time I/O usage. iostat -d -x -k 1 10 – detailed I/O statistics. pidstat -d 1 -p <pid> – per‑process I/O. perf record -e block:block_rq_issue -a → perf report – kernel‑level I/O tracing.

Network Analysis

Key metrics: latency, packet loss, socket states. netstat -s, netstat -nu, netstat -apu – socket statistics. ss -t -a, ss -s, ss -u -a – modern socket inspection. sar -n TCP,ETCP 1, sar -n DEV 1 – throughput and errors. tcpdump -i eth1 host 192.168.1.1 and port 80, tcpflow -cp host 192.168.1.1 – packet capture.

System Load

Load average reflects the length of the runnable queue over 1, 5, and 15‑minute intervals. uptime, top, vmstat – quick load view. strace -c -p <pid> – system‑call cost breakdown. strace -T -e epoll_wait -p <pid> – latency of specific syscalls. dmesg – kernel log for warnings.

Flame Graphs

Flame graphs visualize sampled stack traces. The y‑axis shows call‑stack depth; the x‑axis shows sample frequency (wider bars = more time). Types include on‑CPU, off‑CPU, memory, and differential graphs.

Installation

# Install systemtap and runtime
yum install systemtap systemtap-runtime
# Install matching kernel debuginfo packages (example for RHEL 5)
kernel-debuginfo-2.6.18-308.el5.x86_64.rpm
kernel-devel-2.6.18-308.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-308.el5.x86_64.rpm
# Enable debuginfo repos
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc

Clone the helper repository:

git clone https://github.com/lidaohang/quick_location.git
cd quick_location

Generating On‑CPU Flame Graphs

# User‑space profile
sh ngx_on_cpu_u.sh <pid>
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/<pid>.svg

# Kernel‑space profile
sh ngx_on_cpu_k.sh <pid>
cd ngx_on_cpu_k
python -m SimpleHTTPServer 8088

Generating Off‑CPU Flame Graphs

# User‑space off‑CPU profile
sh ngx_off_cpu_u.sh <pid>
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088

# Kernel‑space off‑CPU profile
sh ngx_off_cpu_k.sh <pid>
cd ngx_off_cpu_k
python -m SimpleHTTPServer 8088

Differential Flame Graphs

# Record two profiles (before and after change)
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks1
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks2
# Collapse and diff
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | \
  ./FlameGraph/flamegraph.pl > diff2.svg

Red bars indicate increased cost; blue bars indicate reduced cost.

Real‑World Case Study: Nginx Cluster Anomaly

Checked request volume – traffic was actually decreasing.

Examined Nginx response times – latency had risen, suggesting upstream impact.

Inspected upstream response times – also increased, pointing to a backend slowdown.

Used top and perf top -p <pid> – identified heavy CPU consumption in JSON parsing and memory allocation.

Generated on‑CPU flame graphs – confirmed the JSON library dominated CPU usage.

Correlated findings – concluded a recent code change introduced inefficient JSON handling, amplifying upstream latency.

Remediation – disabled the problematic module, observed CPU drop and restored normal request flow.

Summary

Combining the 5W2H questioning framework with a curated set of Linux observability tools (top, vmstat, mpstat, pidstat, perf, iostat, iotop, netstat, ss, sar, strace, dmesg) enables rapid root‑cause analysis across CPU, memory, I/O, network, and load dimensions. Flame graphs, especially differential ones, provide visual insight into performance regressions and are valuable for continuous integration pipelines and production incident response.

References

Brendan Gregg – Flame Graphs: https://github.com/brendangregg/FlameGraph

OpenResty SystemTap Toolkit: https://github.com/openresty/openresty-systemtap-toolkit

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

flame graph performance-analysis Memory Debugging CPU profiling I/O Monitoring

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.