Master Linux Performance Analysis: CPU, Memory, Disk, Network & Flame Graphs
This comprehensive guide explains how to diagnose Linux performance problems—from CPU, memory, disk I/O, and network bottlenecks to system load—using tools like top, vmstat, perf, iostat, netstat, and flame graphs, and demonstrates the methodology with a real‑world Nginx case study.
Background
When monitoring plugins cannot immediately reveal the root cause of a problem, logging into the server for deeper analysis is required. Accumulated technical experience and a broad knowledge base are essential for effective troubleshooting.
Purpose
This article introduces various problem‑location tools and demonstrates their use with case studies.
Methodology
Applying the 5W2H method to performance analysis raises the following questions:
What – what is the phenomenon?
When – when does it occur?
Why – why does it happen?
Where – where does it happen?
How much – how many resources are consumed?
How to do – how to solve it?
CPU Analysis
Focuses on kernel CPU scheduler functions and thread‑state analysis. Thread states are divided into on‑CPU (user and sys time) and off‑CPU (waiting, I/O, lock, idle, etc.).
Tools
uptime, vmstat, mpstat, top, pidstat – basic CPU/load information.
perf – detailed per‑function CPU usage and kernel‑function statistics.
Usage
top
mpstat -P ALL 1
vmstat 1
pidstat -u 1 -p pid
perf top -p pid -e cpu-clockMemory Analysis
Memory concepts include main memory, virtual memory, resident memory, address space, OOM, page cache, page fault, swapping, and allocators (libc, glibc, libmalloc, mtmalloc, SLUB).
Tools
free, vmstat, top, pidstat, pmap – memory usage.
valgrind – memory leak detection.
dtrace – kernel‑level tracing.
Usage
free -m
vmstat 1
top
pidstat -r 1 -p pid
pmap -d pid
valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./programDisk I/O Analysis
Disk is the slowest subsystem; understanding file system, VFS, caches, inode, and I/O scheduling is essential.
Tools
iotop – real‑time I/O.
iostat -d -x -k 1 10 – detailed I/O statistics.
pidstat -d 1 -p pid – per‑process I/O.
perf record -e block:block_rq_issue -ag – kernel I/O tracing.
Usage
iotop
iostat -d -x -k 1 10
pidstat -d 1 -p pid
perf record -e block:block_rq_issue -ag
perf reportNetwork Analysis
Network monitoring is complex due to latency, blocking, collisions, packet loss, and external devices.
Tools
netstat -s, -nu, -apu – various network statistics.
ss -t -a, -s, -u -a – socket summaries.
sar -n TCP,ETCP 1 – TCP/ETCP stats.
sar -n DEV 1 – network I/O.
tcpdump, tcpflow – packet capture.
Usage
netstat -s
netstat -nu
netstat -apu
ss -t -a
ss -s
ss -u -a
sar -n TCP,ETCP 1
sar -n DEV 1
tcpdump -i eth1 host 192.168.1.1 and port 80
tcpflow -cp host 192.168.1.1System Load
Load measures the amount of work a system does; Load Average is the average over 1, 5, and 15 minutes.
Tools
uptime, top, vmstat – load inspection.
strace -c -p pid – system‑call cost.
strace -T -e epoll_wait -p pid – specific syscall tracing.
dmesg – kernel logs.
Usage
uptime
top
vmstat
strace -c -p pid
strace -T -e epoll_wait -p pid
dmesgFlame Graphs
Flame graphs visualize CPU call stacks; the y‑axis shows stack depth, the x‑axis shows sample frequency. Wider bars indicate functions that consume more CPU time.
Installation
yum install systemtap systemtap-runtime
# install matching kernel‑debuginfo packages
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibcUsage
git clone https://github.com/lidaohang/quick_location.git
cd quick_location
# on‑CPU user flame graph
sh ngx_on_cpu_u.sh pid
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# off‑CPU flame graph
sh ngx_off_cpu_u.sh pid
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# memory flame graph
sh ngx_on_memory.sh pid
cd ngx_on_memory
python -m SimpleHTTPServer 8088Examples of on‑CPU, off‑CPU, memory, and differential flame graphs are shown.
Case Study – Nginx Cluster Issue
On 2017‑09‑25 the Nginx cluster showed many 499/5xx responses and high CPU usage. Analysis steps:
Request traffic did not spike; traffic actually decreased.
Response time increased, possibly due to Nginx or upstream.
Upstream response time grew, suggesting backend delay.
top showed high Nginx worker CPU.
perf top revealed most cost in free, malloc, and JSON parsing.
Flame graphs confirmed JSON parsing as the hotspot.
Resolution: disable the high‑cost module, observe CPU drop, and verify normal request flow. The upstream delay was caused by a loop back to Nginx.
References
http://www.brendangregg.com/
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html
http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
https://github.com/openresty/openresty-systemtap-toolkit
https://github.com/brendangregg/FlameGraph
https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
