Operations 17 min read

Master Linux Performance Analysis: CPU, Memory, Disk, Network & Flame Graphs

This comprehensive guide explains how to diagnose Linux performance problems—from CPU, memory, disk I/O, and network bottlenecks to system load—using tools like top, vmstat, perf, iostat, netstat, and flame graphs, and demonstrates the methodology with a real‑world Nginx case study.

Open Source Linux

Sep 11, 2023

Master Linux Performance Analysis: CPU, Memory, Disk, Network & Flame Graphs

Background

When monitoring plugins cannot immediately reveal the root cause of a problem, logging into the server for deeper analysis is required. Accumulated technical experience and a broad knowledge base are essential for effective troubleshooting.

Purpose

This article introduces various problem‑location tools and demonstrates their use with case studies.

Methodology

Applying the 5W2H method to performance analysis raises the following questions:

What – what is the phenomenon?

When – when does it occur?

Why – why does it happen?

Where – where does it happen?

How much – how many resources are consumed?

How to do – how to solve it?

CPU Analysis

Focuses on kernel CPU scheduler functions and thread‑state analysis. Thread states are divided into on‑CPU (user and sys time) and off‑CPU (waiting, I/O, lock, idle, etc.).

Tools

uptime, vmstat, mpstat, top, pidstat – basic CPU/load information.

perf – detailed per‑function CPU usage and kernel‑function statistics.

Usage

top
mpstat -P ALL 1
vmstat 1
pidstat -u 1 -p pid
perf top -p pid -e cpu-clock

Memory Analysis

Memory concepts include main memory, virtual memory, resident memory, address space, OOM, page cache, page fault, swapping, and allocators (libc, glibc, libmalloc, mtmalloc, SLUB).

Tools

free, vmstat, top, pidstat, pmap – memory usage.

valgrind – memory leak detection.

dtrace – kernel‑level tracing.

Usage

free -m
vmstat 1
top
pidstat -r 1 -p pid
pmap -d pid
valgrind --tool=memcheck --leak-check=full --log-file=log.txt ./program

Disk I/O Analysis

Disk is the slowest subsystem; understanding file system, VFS, caches, inode, and I/O scheduling is essential.

Tools

iotop – real‑time I/O.

iostat -d -x -k 1 10 – detailed I/O statistics.

pidstat -d 1 -p pid – per‑process I/O.

perf record -e block:block_rq_issue -ag – kernel I/O tracing.

Usage

iotop
iostat -d -x -k 1 10
pidstat -d 1 -p pid
perf record -e block:block_rq_issue -ag
perf report

Network Analysis

Network monitoring is complex due to latency, blocking, collisions, packet loss, and external devices.

Tools

netstat -s, -nu, -apu – various network statistics.

ss -t -a, -s, -u -a – socket summaries.

sar -n TCP,ETCP 1 – TCP/ETCP stats.

sar -n DEV 1 – network I/O.

tcpdump, tcpflow – packet capture.

Usage

netstat -s
netstat -nu
netstat -apu
ss -t -a
ss -s
ss -u -a
sar -n TCP,ETCP 1
sar -n DEV 1
tcpdump -i eth1 host 192.168.1.1 and port 80
tcpflow -cp host 192.168.1.1

System Load

Load measures the amount of work a system does; Load Average is the average over 1, 5, and 15 minutes.

Tools

uptime, top, vmstat – load inspection.

strace -c -p pid – system‑call cost.

strace -T -e epoll_wait -p pid – specific syscall tracing.

dmesg – kernel logs.

Usage

uptime
top
vmstat
strace -c -p pid
strace -T -e epoll_wait -p pid
dmesg

Flame Graphs

Flame graphs visualize CPU call stacks; the y‑axis shows stack depth, the x‑axis shows sample frequency. Wider bars indicate functions that consume more CPU time.

Installation

yum install systemtap systemtap-runtime
# install matching kernel‑debuginfo packages
debuginfo-install --enablerepo=debuginfo search kernel
debuginfo-install --enablerepo=debuginfo search glibc

Usage

git clone https://github.com/lidaohang/quick_location.git
cd quick_location
# on‑CPU user flame graph
sh ngx_on_cpu_u.sh pid
cd ngx_on_cpu_u
python -m SimpleHTTPServer 8088
# off‑CPU flame graph
sh ngx_off_cpu_u.sh pid
cd ngx_off_cpu_u
python -m SimpleHTTPServer 8088
# memory flame graph
sh ngx_on_memory.sh pid
cd ngx_on_memory
python -m SimpleHTTPServer 8088

Examples of on‑CPU, off‑CPU, memory, and differential flame graphs are shown.

Case Study – Nginx Cluster Issue

On 2017‑09‑25 the Nginx cluster showed many 499/5xx responses and high CPU usage. Analysis steps:

Request traffic did not spike; traffic actually decreased.

Response time increased, possibly due to Nginx or upstream.

Upstream response time grew, suggesting backend delay.

top showed high Nginx worker CPU.

perf top revealed most cost in free, malloc, and JSON parsing.

Flame graphs confirmed JSON parsing as the hotspot.

Resolution: disable the high‑cost module, observe CPU drop, and verify normal request flow. The upstream delay was caused by a loop back to Nginx.

References

http://www.brendangregg.com/

http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html

http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html

http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html

https://github.com/openresty/openresty-systemtap-toolkit

https://github.com/brendangregg/FlameGraph

https://www.slideshare.net/brendangregg/blazing-performance-with-flame-graphs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

System Monitoring CPU analysis memory profiling Linux performance flame graphs

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.