Operations 18 min read

Mastering Linux Performance: From CPU to Flame Graphs and Real‑World Case Studies

This comprehensive guide explains how to diagnose Linux performance issues using systematic 5W2H analysis, essential monitoring tools for CPU, memory, disk I/O, network, and flame‑graph visualizations, and demonstrates the methodology with a detailed nginx case study to quickly locate bottlenecks.

MaGe Linux Operations

Jun 27, 2023

Mastering Linux Performance: From CPU to Flame Graphs and Real‑World Case Studies

Background

When monitoring plugins cannot immediately reveal the root cause of a problem, logging into the server for deeper analysis is required. Effective analysis demands technical experience and a broad knowledge base, and good tools can dramatically speed up troubleshooting.

Purpose

This article introduces various problem‑location tools and illustrates their use with real‑world examples.

Problem‑analysis Methodology (5W2H)

What – what is the phenomenon?

When – when does it occur?

Why – why does it happen?

Where – where does it happen?

How much – how many resources are consumed?

How to do – how to solve it?

CPU

Explanation

For applications we usually focus on the kernel CPU scheduler and its performance. Thread‑state analysis distinguishes on‑CPU (user and system time) and off‑CPU (waiting for I/O, locks, paging, etc.).

Key Concepts

Processor

Core

Hardware thread

CPU cache

Clock frequency

CPI / IPC

Instructions

Utilization

User time / kernel time

Scheduler

Run queue

Preemption

Multi‑process / multi‑thread

Word size

Analysis Tools

uptime, vmstat, mpstat, top, pidstat can show CPU usage and load. perf can trace function‑level time and kernel functions.

Usage

// view system CPU usage
top

// view per‑CPU information
mpstat -P ALL 1

// view CPU usage and average load
vmstat 1

// process‑level CPU statistics
pidstat -u 1 -p <pid>

// trace function‑level CPU usage in a process
perf top -p <pid> -e cpu-clock

Memory

Explanation

Memory issues affect not only performance but also service availability. Important concepts include main memory, virtual memory, resident set, address space, OOM, page cache, page faults, swapping, and the Linux SLUB allocator.

Analysis Tools

free, vmstat, top, pidstat, pmap show memory usage; valgrind detects leaks; dtrace can trace kernel functions.

Usage

// view system memory usage
free -m

// view virtual memory statistics
vmstat 1

// view system memory details
top

// per‑process memory statistics
pidstat -p <pid> -r 1

// view process memory map
pmap -d <pid>

// detect memory leaks
valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./program

Disk I/O

Explanation

Disk is the slowest subsystem and a common performance bottleneck. Understanding file systems, VFS, page cache, buffer cache, inode cache, and I/O schedulers is essential.

Analysis Tools

iotop, iostat, pidstat, perf record (block events).

Usage

// monitor I/O in real time
iotop

// detailed I/O statistics
iostat -d -x -k 1 10

// per‑process I/O statistics
pidstat -d 1 -p <pid>

// trace block I/O events
perf record -e block:block_rq_issue -a
perf report

Network

Explanation

Network monitoring is complex due to latency, blocking, collisions, packet loss, and interactions with routers, switches, and wireless signals.

Analysis Tools

netstat, ss, sar, tcpdump, tcpflow.

Usage

// show network statistics
netstat -s

// show current UDP connections
netstat -nu

// show UDP port usage
netstat -apu

// count connections per state
netstat -a | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

// show TCP connections
ss -t -a

// show socket summary
ss -s

// show all UDP sockets
ss -u -a

// show TCP/ETCP statistics
sar -n TCP,ETCP 1

// show network I/O statistics
sar -n DEV 1

// capture packets to a host and port
tcpdump -i eth1 host 192.168.1.1 and port 80

// capture and display packet contents as a flow
tcpflow -cp host 192.168.1.1

System Load

Load measures how much work the system is doing; Load Average is the average over 1, 5, and 15 minutes.

Analysis Tools

// view load
uptime

// interactive view
top

// view system statistics
vmstat

// trace system call latency
strace -c -p <pid>

// trace specific syscalls (e.g., epoll_wait)
strace -T -e epoll_wait -p <pid>

// view kernel logs
dmesg

Flame Graphs

Explanation

Flame graphs (by Brendan Gregg) visualize CPU call stacks. The y‑axis shows stack depth, the x‑axis shows sample count (not time). Wide boxes indicate functions that consume more CPU.

Installation

# install systemtap (if not already present)
yum install systemtap systemtap-runtime
# install kernel debug packages matching the running kernel
kernel-debuginfo-<kernel-version>.rpm
kernel-devel-<kernel-version>.rpm
kernel-debuginfo-common-<kernel-version>.rpm
# enable debuginfo repo and install
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc

Usage

Clone the flame‑graph repository and generate graphs for a target process:

git clone https://github.com/lidaohang/quick_location.git
cd quick_location

CPU‑level Flame Graphs

On‑CPU flame graphs show where the CPU spends time in user or kernel mode. Off‑CPU graphs show where threads are waiting.

On‑CPU Example

// generate user‑mode on‑CPU flame graph
sh ngx_on_cpu_u.sh <pid>
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# open http://127.0.0.1:8088/<pid>.svg

Off‑CPU Example

// generate off‑CPU flame graph
sh ngx_off_cpu_u.sh <pid>
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# open http://127.0.0.1:8088/<pid>.svg

Memory‑level Flame Graphs

// generate memory flame graph
sh ngx_on_memory.sh <pid>
cd ngx_on_memory
python -m SimpleHTTPServer 8088
# open http://127.0.0.1:8088/<pid>.svg

Diff Flame Graphs (Red‑Blue)

Capture two profiles (before and after a change) and generate a differential flame graph to highlight regressions (red) and improvements (blue).

# profile before change
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks1
# profile after change
perf record -F 99 -p <pid> -g -- sleep 30
perf script > out.stacks2
# collapse and diff
./FlameGraph/stackcollapse-perf.pl out.stacks1 > out.folded1
./FlameGraph/stackcollapse-perf.pl out.stacks2 > out.folded2
./FlameGraph/difffolded.pl out.folded1 out.folded2 | ./FlameGraph/flamegraph.pl > diff.svg

Case Study – Nginx Cluster Issue

Problem

On 2017‑09‑25 the Nginx cluster showed many 499 and 5xx responses, and CPU usage spiked.

Analysis Steps

Check request traffic – traffic was actually decreasing, so the spike was not due to load.

Analyze Nginx response time – response time increased, possibly due to Nginx itself or upstream latency.

Analyze upstream response time – upstream latency grew, suggesting backend delay.

Inspect system CPU – top showed high Nginx worker CPU usage.

Profile Nginx process – perf top -p revealed most time spent in JSON parsing and memory allocation.

Generate on‑CPU flame graph – identified the JSON library as the hotspot.

Conclusion

The root cause was an inefficient JSON parser consuming excessive CPU; the upstream delay was a symptom, not the cause. Disabling the problematic module reduced CPU usage and restored normal traffic.

Reference

Original article (copyright belongs to the author).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux Monitoring Tools performance-analysis CPU profiling

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.