Operations 19 min read

How to Diagnose Linux Performance Issues with Flame Graphs and System Tools

This guide explains how to systematically analyze Linux performance problems—including CPU, memory, disk I/O, network, and load—using 5W2H methodology, built‑in monitoring commands, perf, flame‑graph visualizations, and a real‑world Nginx case study to pinpoint and resolve bottlenecks.

Liangxu Linux

Jul 19, 2020

How to Diagnose Linux Performance Issues with Flame Graphs and System Tools

Background

Complex production systems often encounter performance anomalies that are not immediately obvious from monitoring dashboards. Deep analysis on the server is required to locate the root cause, which demands a solid toolbox and systematic methodology.

Methodology

We adopt the 5W2H framework to structure performance investigations:

What – describe the observed symptom.

When – identify the time window when it occurs.

Why – hypothesize possible reasons.

Where – pinpoint the subsystem (CPU, memory, I/O, network, etc.).

How much – quantify resource consumption.

How to do – define concrete steps and tools for diagnosis.

CPU Analysis

Key Concepts

CPU time is split into on‑CPU (user and system execution) and off‑CPU (waiting for I/O, locks, scheduling). Understanding thread states, CPI/IPC, scheduler queues, and CPU cache behavior is essential.

Analysis Tools

uptime, vmstat, mpstat, top, pidstat – provide overall CPU and load metrics.

perf – captures per‑function CPU usage and can target kernel functions.

Typical Commands

//查看系统cpu使用情况 top</code>
<code>//查看所有cpu核信息 mpstat -P ALL 1</code>
<code>//查看cpu使用情况以及平均负载 vmstat 1</code>
<code>//进程cpu的统计信息 pidstat -u 1 -p pid</code>
<code>//跟踪进程内部函数级cpu使用情况 perf top -p pid -e cpu-clock

Memory Analysis

Key Concepts

Memory performance involves understanding physical RAM, virtual memory, page cache, OOM events, and allocator behavior (glibc, jemalloc, SLUB, etc.).

Analysis Tools

free, vmstat, top, pidstat, pmap – report overall and per‑process memory usage.

valgrind – detects memory leaks.

dtrace – dynamic tracing of kernel memory functions (requires custom scripts).

Typical Commands

//查看系统内存使用情况 free -m</code>
<code>//虚拟内存统计信息 vmstat 1</code>
<code>//查看系统内存情况 top</code>
<code>//1s采集周期，获取内存的统计信息 pidstat -p pid -r 1</code>
<code>//查看进程的内存映像信息 pmap -d pid</code>
<code>//检测程序内存问题 valgrind --tool=memcheck --leak-check=full --log-file=./log.txt ./program

Disk I/O Analysis

Key Concepts

Disk subsystems are the slowest component; understanding file systems, VFS, page cache, buffer cache, and inode structures is crucial for I/O bottleneck analysis.

Analysis Tools

iostat – detailed I/O statistics.

iotop – real‑time per‑process I/O usage.

perf – can record block I/O events.

Typical Commands

//查看系统io信息 iotop</code>
<code>//统计io详细信息 iostat -d -x -k 1 10</code>
<code>//查看进程级io的信息 pidstat -d 1 -p pid</code>
<code>//捕获块请求 perf record -e block:block_rq_issue -ag && perf report

Network Analysis

Key Concepts

Network performance is affected by latency, packet loss, congestion, and external devices (switches, routers, wireless). Adaptive NICs adjust speed based on link conditions.

Analysis Tools

netstat, ss – display socket statistics.

sar – collect TCP/UDP metrics.

tcpdump, tcpflow – packet capture and flow analysis.

Typical Commands

//显示网络统计信息 netstat -s</code>
<code>//显示当前UDP连接状况 netstat -nu</code>
<code>//统计机器中网络连接各状态 netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'</code>
<code>//显示TCP连接 ss -t -a</code>
<code>//显示sockets摘要信息 ss -s</code>
<code>//抓包 tcpdump -i eth1 host 192.168.1.1 and port 80</code>
<code>//流式抓包 tcpflow -c p host 192.168.1.1

System Load

Key Concepts

Load average reflects the length of the runnable process queue over 1, 5, and 15‑minute intervals, providing a high‑level view of system pressure.

Analysis Tools

uptime, top, vmstat – quick load checks.

strace – measures time spent in system calls.

dmesg – kernel log for hardware or scheduler warnings.

Typical Commands

//查看负载情况 uptime top vmstat</code>
<code>//统计系统调用耗时情况 strace -c -p pid</code>
<code>//跟踪特定系统调用 strace -T -e epoll_wait -p pid</code>
<code>//查看内核日志信息 dmesg

Flame Graphs

What They Are

Flame graphs visualize stacked call‑stacks; the x‑axis represents aggregated sample counts, the y‑axis represents call depth. Wider boxes indicate functions that consume more CPU time.

Installation

//安装 systemtap
yum install -y systemtap systemtap-runtime</code>
<pre><code>//安装对应内核的调试符号
debuginfo-install --enablerepo=debuginfo kernel

Getting the Tools

git clone https://github.com/brendangregg/FlameGraph.git

Generating On‑CPU Flame Graphs

//on‑CPU usersh ngx_on_cpu_u.sh pid</code>
<code>//进入结果目录 cd ngx_on_cpu_u</code>
<code>//启动临时 HTTP 服务器 python -m SimpleHTTPServer 8088</code>
<code>//在浏览器打开 127.0.0.1:8088/pid.svg

Generating Off‑CPU Flame Graphs

//off‑CPU usersh ngx_off_cpu_u.sh pid</code>
<code>//进入结果目录 cd ngx_off_cpu_u</code>
<code>//启动临时 HTTP 服务器 python -m SimpleHTTPServer 8088</code>
<code>//在浏览器打开 127.0.0.1:8088/pid.svg

Differential (Red‑Blue) Flame Graphs

Capture two profiles (before and after a change), collapse stacks, and run difffolded.pl to highlight regressions (red) and improvements (blue).

cd quick_location</code>
<code>//采集基线 profile 1
perf record -F 99 -p pid -g -- sleep 30 && perf script > out.stacks1</code>
<code>//采集变更后 profile 2
perf record -F 99 -p pid -g -- sleep 30 && perf script > out.stacks2</code>
<code>//生成差分火焰图
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | ./FlameGraph/flamegraph.pl > diff.svg

Case Study – Nginx Cluster Anomaly (Sept 2017)

Observed Symptoms

Spike of 499 and 5xx HTTP status codes.

Sustained high CPU usage on Nginx workers.

Step‑by‑Step Diagnosis

Checked request volume – traffic actually decreased, so load surge was not the cause.

Analyzed Nginx response time – increased latency suggested either Nginx itself or upstream services.

Inspected upstream response time – also increased, indicating a possible backend bottleneck.

Used top to confirm Nginx worker CPU was high.

Ran perf top -p <pid> – identified heavy time spent in free, malloc, and JSON parsing.

Generated on‑CPU flame graph – highlighted the JSON library as the dominant consumer.

Findings

Backend upstream latency contributed to overall request slowdown.

Inside Nginx, a JSON‑parsing module consumed excessive CPU and memory allocations.

Resolution

Temporarily disabled the high‑cost JSON module, which immediately lowered CPU usage and restored normal request rates. The upstream latency remained, but the Nginx‑side bottleneck was eliminated.

References

http://www.brendangregg.com/

http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html

http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html

http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html

https://github.com/openresty/openresty-systemtap-toolkit

https://github.com/brendangregg/FlameGraph

https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring performance troubleshooting flamegraph

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Methodology

CPU Analysis

Key Concepts

Analysis Tools

Typical Commands

Memory Analysis

Key Concepts

Analysis Tools

Typical Commands

Disk I/O Analysis

Key Concepts

Analysis Tools

Typical Commands

Network Analysis

Key Concepts

Analysis Tools

Typical Commands

System Load

Key Concepts

Analysis Tools

Typical Commands

Flame Graphs

What They Are

Installation

Getting the Tools

Generating On‑CPU Flame Graphs

Generating Off‑CPU Flame Graphs

Differential (Red‑Blue) Flame Graphs

Case Study – Nginx Cluster Anomaly (Sept 2017)

Observed Symptoms

Step‑by‑Step Diagnosis

Findings

Resolution

References

Liangxu Linux

How this landed with the community

Was this worth your time?

0 Comments

Case Study – Nginx Cluster Anomaly (Sept 2017)