Operations 20 min read

Master Linux Performance: CPU, Memory, IO, and Flame Graphs for Nginx Troubleshooting

This guide explains how to diagnose Linux performance bottlenecks—CPU, memory, disk I/O, network, and system load—using tools such as top, vmstat, perf, and flame graphs, and demonstrates a real‑world Nginx case study to pinpoint high‑CPU JSON parsing and upstream latency issues.

Efficient Ops
Efficient Ops
Efficient Ops
Master Linux Performance: CPU, Memory, IO, and Flame Graphs for Nginx Troubleshooting

1. Background

Sometimes complex issues arise that monitoring plugins cannot immediately reveal; deep analysis on the server is required, demanding technical experience across many domains to locate the root cause.

Effective analysis tools can greatly accelerate problem identification, saving time for deeper work.

2. Overview

This article introduces various troubleshooting tools and demonstrates their use with case studies.

3. Problem‑analysis Methodology

Applying the 5W2H framework yields key performance questions:

What : What is the observed phenomenon?

When : When does it occur?

Why : Why does it happen?

Where : Where does it happen?

How much : How many resources are consumed?

How to do : How can it be resolved?

4. CPU

4.1 Overview

For applications, the kernel CPU scheduler’s functionality and performance are primary concerns. Thread‑state analysis distinguishes on‑CPU (user and system time) from off‑CPU (waiting for I/O, locks, paging, etc.).

Heavy on‑CPU time indicates a need for CPU profiling; extensive off‑CPU time suggests bottlenecks elsewhere.

4.2 Tools

uptime, vmstat, mpstat, top, pidstat – basic CPU and load metrics.

perf – detailed per‑function CPU usage, can target kernel functions.

4.3 Usage

<code>// View overall CPU usage
top
// Show per‑CPU statistics
mpstat -P ALL 1
// Show load averages
vmstat 1
// Process‑level CPU stats
pidstat -u 1 -p <pid>
# Profile a process with perf
perf top -p <pid> -e cpu-clock
</code>

5. Memory

5.1 Overview

Memory issues can affect performance, service availability, or cause crashes. Key concepts include:

Main memory

Virtual memory

Resident memory

Address space

OOM (Out‑of‑Memory)

Page cache

Page faults and swapping

Allocators (libc, glibc, jemalloc, etc.)

Linux kernel SLUB allocator

5.2 Tools

free, vmstat, top, pidstat, pmap – memory usage statistics.

valgrind – memory leak detection.

dtrace – dynamic tracing of kernel functions (requires deep kernel knowledge).

5.3 Usage

<code>// Show system memory usage
free -m
// Virtual memory stats
vmstat 1
// Process memory map
pmap -d <pid>
# Detect leaks with valgrind
valgrind --tool=memcheck --leak-check=full --log-file=./log.txt ./program
</code>

6. Disk I/O

6.1 Overview

Disk subsystems are often the slowest component, introducing performance bottlenecks due to mechanical latency. Understanding file systems, VFS, page cache, buffer cache, inode structures, and I/O schedulers is essential.

6.2 Tools

6.3 Usage

<code>// Real‑time I/O monitoring
iotop
// Detailed I/O stats
iostat -d -x -k 1 10
// Process‑level I/O
pidstat -d 1 -p <pid>
# Block‑level tracing with perf
perf record -e block:block_rq_issue -ag
perf report
</code>

7. Network

7.1 Overview

Network monitoring is complex due to latency, congestion, packet loss, and interactions with routers, switches, and wireless signals. Adaptive NICs adjust to varying link speeds and modes.

7.2 Tools

7.3 Usage

<code>// Network statistics
netstat -s
// UDP connections
netstat -nu
// UDP port usage
netstat -apu
// Count connections by state
netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
// TCP sockets
ss -t -a
// Socket summary
ss -s
// UDP sockets
ss -u -a
// TCP/ETCP stats
sar -n TCP,ETCP 1
// Network I/O
sar -n DEV 1
// Packet capture
tcpdump -i eth1 host 192.168.1.1 and port 80
// Flow capture
tcpflow -c -p host 192.168.1.1
</code>

8. System Load

8.1 Overview

Load measures the amount of work a system is doing, expressed as the average number of processes in the run queue over 1, 5, and 15‑minute intervals.

8.2 Tools

8.3 Usage

<code>// Load and uptime
uptime
top
vmstat
// System call latency
strace -c -p <pid>
# Trace specific syscalls
strace -T -e epoll_wait -p <pid>
// Kernel messages
dmesg
</code>

9. Flame Graphs

9.1 Overview

Flame graphs visualize CPU call stacks; the y‑axis shows stack depth, the x‑axis represents sample counts. Wider blocks indicate functions consuming more CPU time. Variants include on‑CPU, off‑CPU, memory, and differential flame graphs.

9.2 Installing Dependencies

<code># Install systemtap
yum install systemtap systemtap-runtime
# Install matching kernel debug packages
yum install kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r) kernel-debuginfo-common-$(uname -r)
# Install additional debug info
debuginfo-install --enablerepo=debuginfo glibc kernel
</code>

9.3 Getting the Toolkit

<code>git clone https://github.com/lidaohang/quick_location.git
cd quick_location
</code>

9.4 CPU‑level Flame Graphs

9.4.1 On‑CPU

Generate and view an on‑CPU flame graph for a process:

<code>// Record user‑space CPU usage
sh ngx_on_cpu_u.sh <pid>
cd ngx_on_cpu_u
# Serve the SVG
python -m SimpleHTTPServer 8088
# Open http://127.0.0.1:8088/<pid>.svg
</code>

9.4.2 Off‑CPU

Generate an off‑CPU flame graph to locate waiting time:

<code>// Record off‑CPU time
sh ngx_off_cpu_u.sh <pid>
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
</code>

9.5 Memory‑level Flame Graphs

Use the provided script to capture memory‑related flame graphs:

<code>sh ngx_on_memory.sh <pid>
cd ngx_on_memory
python -m SimpleHTTPServer 8088
</code>

9.6 Differential (Red‑Blue) Flame Graphs

Compare two profiling runs to highlight performance regressions:

<code># Capture baseline
perf record -F 99 -p <pid> -g -- sleep 30 > out1.stacks
# Capture after changes
perf record -F 99 -p <pid> -g -- sleep 30 > out2.stacks
# Collapse and diff
./FlameGraph/stackcollapse-perf.pl out1.stacks > out1.folded
./FlameGraph/stackcollapse-perf.pl out2.stacks > out2.folded
./FlameGraph/difffolded.pl out1.folded out2.folded | ./FlameGraph/flamegraph.pl > diff.svg
</code>

10. Case Study: Nginx Cluster Anomaly

10.1 Symptoms

On 2017‑09‑25, monitoring showed a surge of 499 and 5xx responses from an Nginx cluster, accompanied by rising CPU usage.

10.2 Nginx Metrics Analysis

Request traffic had actually decreased, so the spike was not traffic‑related.

Response times increased, possibly due to Nginx itself or upstream latency.

Upstream response times grew, suggesting backend delays affecting Nginx.

10.3 System CPU Investigation

Top revealed high CPU consumption by Nginx workers.

perf top identified hotspots in free, malloc, and JSON parsing.

10.4 Flame‑Graph Insight

On‑CPU flame graph confirmed intensive JSON parsing by a low‑performance library.

10.5 Summary

Upstream latency contributed to request anomalies.

Internal Nginx modules, especially JSON parsing and memory allocation, caused high CPU usage.

Disabling the costly module reduced CPU load and normalized traffic.

11. References

http://www.brendangregg.com/index.html

http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html

http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html

http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html

https://github.com/openresty/openresty-systemtap-toolkit

https://github.com/brendangregg/FlameGraph

https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs

Performance analysisCPU Profilinglinux monitoringFlame Graphsnginx troubleshooting
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.