Master Linux Performance: Top, iotop, pidstat, sar – Real‑World Diagnostic Guide
This guide covers Linux performance analysis tools—including top, htop, iotop, pidstat, iostat, sar, and vmstat—detailing installation, usage, key metrics, troubleshooting scenarios, monitoring integration with Prometheus, and best‑practice recommendations for effective system diagnostics and capacity planning.
Linux Performance Analysis Toolset: top, iotop, pidstat, sar Practical Diagnostic Manual
Applicable Scenarios & Prerequisites
Applicable Scenarios : CPU high load investigation, memory leak diagnosis, I/O bottleneck location, performance baseline establishment, capacity planning.
Prerequisites :
OS: RHEL/CentOS 7.x‑9.x, Ubuntu 18.04‑24.04
Tool package: sysstat (provides iostat/pidstat/sar), iotop
Permissions: some tools require root (e.g., iotop)
Kernel: CONFIG_TASK_DELAY_ACCT=y (required by iotop)
Environment and Version Matrix
Tool
Package
Purpose
Applicable Scenarios
top
procps-ng
Real‑time process monitoring
CPU/Memory usage investigation
htop
htop
Enhanced top
Interactive multi‑core CPU monitoring
iotop
iotop
I/O monitoring
Disk I/O bottleneck location
pidstat
sysstat
Process statistics
Single‑process CPU/Memory/I/O analysis
iostat
sysstat
Disk I/O statistics
Disk performance analysis
sar
sysstat
Historical performance data
Trend analysis, capacity planning
vmstat
procps-ng
Virtual memory statistics
Memory/Swap/CPU comprehensive monitoring
Quick Checklist
Install performance analysis toolset
Enable sysstat data collection (sar historical data)
Use top to diagnose high‑CPU processes
Use iotop to locate I/O‑intensive processes
Use pidstat to analyze single‑process performance
Use iostat to diagnose disk bottlenecks
Use sar to analyze historical performance trends
Establish performance baseline and alert thresholds
Combine tools to troubleshoot complex issues
Export performance data for long‑term analysis
Tool Details
1. top – Real‑time Process Monitoring
Installation : Comes with procps‑ng package.
Basic Usage :
# start top
top
# common hotkeys (press during run)
P # sort by CPU% (default)
M # sort by MEM%
T # sort by runtime
k # kill process (enter PID)
r # renice process
1 # show all CPU cores separately
c # show full command line
V # tree view of process hierarchy
f # select displayed fields
W # save configuration
q # quitOutput Explanation :
First line (system summary) : load average: 0.50, 0.55, 0.58: 1/5/15‑minute load average
< CPU cores: normal
= CPU cores: full load
> 1.5 × CPU cores: overload
Third line (CPU statistics) : us (user): user‑space CPU % sy (system): kernel‑space CPU % ni (nice): CPU % after priority adjustment id (idle): idle CPU % wa (iowait): I/O wait CPU % ( >20 % indicates I/O bottleneck ) hi/si: hardware/software interrupt CPU % st (steal): CPU stolen by hypervisor
Fourth/Fifth lines (memory statistics) : total: total memory free: completely free memory used: used memory buff/cache: kernel cache (reclaimable) avail Mem: actual available memory (including cache)
Process list fields : VIRT: virtual memory (requested total) RES: resident memory (actual physical usage) SHR: shared memory S: process state (R=running, S=sleeping, D=uninterruptible, Z=zombie) %CPU: CPU usage (may exceed 100 % for multithreaded processes) %MEM: memory usage percentage
Advanced Usage :
# monitor specific user
top -u nginx
# batch mode (output to file)
top -b -n 1 > top-output.txt
# set refresh interval (2 s)
top -d 2
# show specific PIDs
top -p 1234,5678
# show threads
top -HFault‑diagnosis Scenarios :
Scenario 1 – CPU high load: launch top, sort by P, check %wa; if >20 % use iotop for I/O investigation.
Scenario 2 – Memory shortage: sort by M, examine RES, monitor swap usage.
2. htop – Enhanced top
Installation :
# RHEL/CentOS
sudo yum install -y htop
# Ubuntu
sudo apt install -y htopAdvantages :
Colorful output for better readability
Mouse support (click to select processes)
Separate display for each CPU core
Tree view of process hierarchy
Built‑in search and filter
Common Hotkeys :
F1 # help
F2 # setup
F3 # search process
F4 # filter process
F5 # tree view
F6 # sort field selection
F9 # kill process
F10 # quit3. iotop – I/O Monitoring Tool
Installation :
# RHEL/CentOS
sudo yum install -y iotop
# Ubuntu
sudo apt install -y iotopBasic Usage :
# start iotop (requires root)
sudo iotop
# show only processes with I/O activity
sudo iotop -o
# hide threads, show only processes
sudo iotop -P
# set refresh interval (3 s)
sudo iotop -d 3
# batch mode (output to file)
sudo iotop -b -n 3 > iotop-output.txtOutput Details :
Total DISK READ: 10.50 M/s | Total DISK WRITE: 25.00 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1234 be/4 mysql 5.00 M/s 15.00 M/s 0.00 % 90.00 % mysqld
5678 be/4 www 2.00 M/s 5.00 M/s 0.00 % 50.00 % nginx workerKey Fields : DISK READ/WRITE: per‑second read/write speed SWAPIN: swap usage percentage IO>: I/O wait percentage (similar to top’s wa) PRIO: I/O priority (be=best effort, rt=real‑time, idle)
Common Hotkeys :
o # toggle display of only I/O‑active processes
p # toggle process/thread view
a # accumulated mode (total I/O instead of rate)
q # quitFault‑diagnosis Scenario – Disk I/O high :
# start iotop
sudo iotop -o
# examine DISK READ/WRITE columns; sustained high values may indicate
# • slow database queries
# • log explosion
# • backup jobs
# then correlate with iostat for device‑level I/O.4. pidstat – Process Performance Statistics
Installation :
# RHEL/CentOS
sudo yum install -y sysstat
# Ubuntu
sudo apt install -y sysstatBasic Usage :
# all processes CPU stats, refresh every 2 s
pidstat 2
# specific PID
pidstat -p 1234 2
# memory statistics
pidstat -r 2
# I/O statistics
pidstat -d 2
# thread statistics
pidstat -t 2
# context‑switch statistics
pidstat -w 2
# combined CPU+memory+I/O
pidstat -urd 2CPU Statistics Output :
14:30:00 UID PID %usr %system %guest %wait %CPU CPU Command
14:30:02 0 1234 25.00 5.00 0.00 2.00 30.00 2 mysqld
14:30:02 1000 5678 10.00 2.00 0.00 0.00 12.00 1 nginxKey Fields : %usr: user‑space CPU % %system: kernel‑space CPU % %wait: CPU wait % (high → contention) CPU: CPU core on which the process runs
Memory Statistics (-r) :
14:30:00 UID PID minflt/s majflt/s VSZ RSS %MEM Command
14:30:02 0 1234 100.00 0.00 2500000 1200000 7.50 mysqldKey fields: minflt/s (minor page faults), majflt/s (major page faults), VSZ (virtual memory), RSS (resident memory), %MEM (memory %).
I/O Statistics (-d) :
14:30:00 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
14:30:02 0 1234 5000.00 15000.00 0.00 50 mysqldKey fields: kB_rd/s, kB_wr/s, iodelay (I/O latency).
Fault‑diagnosis Scenario – Single process high CPU :
# monitor the process
pidstat -p 1234 1
# if %wait high → CPU contention, consider throttling or scaling
# if %usr high → code optimisation
# drill down to threads
pidstat -t -p 1234 15. iostat – Disk I/O Statistics
Installation : Provided by the sysstat package.
Basic Usage :
# overall CPU and disk I/O
iostat
# refresh every 2 s
iostat 2
# extended statistics
iostat -x 2
# display in MB
iostat -xm 2
# specific device
iostat -x /dev/sda 2
# include device names (instead of numbers)
iostat -xm -p ALL 2Output Details :
avg-cpu: %user %nice %system %iowait %steal %idle
5.20 0.00 2.10 0.20 0.00 92.50
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm await r_await w_await svctm %util
sda 50.00 150.00 2.50 10.00 5.00 20.00 9.09 11.76 8.50 5.00 10.00 4.00 80.00
sdb 10.00 20.00 0.50 1.00 1.00 2.00 9.09 9.09 3.00 2.00 4.00 1.50 10.00Key Metrics : r/s, w/s: reads/writes per second rMB/s, wMB/s: MB transferred per second await: average I/O wait (ms) – <10 ms excellent, 10‑50 ms normal, >100 ms severe %util: device utilization – >80 % indicates I/O bottleneck
Fault‑diagnosis Scenario – Slow disk I/O :
# check utilization and await
iostat -xm 2
# if %util >80% and await >100 ms → consider SSD upgrade, RAID cache, or I/O scheduler tuning.
# then use iotop to locate the offending process.6. sar – System Activity Reporter
Installation & Enable :
# install sysstat
sudo yum install -y sysstat # RHEL/CentOS
sudo apt install -y sysstat # Ubuntu
# enable data collection
sudo systemctl enable sysstat
sudo systemctl start sysstat
# on Ubuntu edit /etc/default/sysstat to set ENABLED="true"Data collection interval defaults to 10 minutes (configurable in /etc/cron.d/sysstat).
Real‑time Commands :
# CPU usage (2 s interval, 10 samples)
sar -u 2 10
# Memory usage
sar -r 2 10
# Disk I/O
sar -d 2 10
# Network traffic
sar -n DEV 2 10
# Swap usage
sar -S 2 10
# Load and context switches
sar -q 2 10Historical Data :
# today’s CPU data
sar -u
# yesterday’s data
sar -u -f /var/log/sysstat/sa$(date -d yesterday +%d)
# specific time range
sar -u -s 10:00:00 -e 12:00:00Key Metrics – CPU (-u) :
# example output
14:30:00 CPU %user %nice %system %iowait %steal %idle
14:30:02 all 5.20 0.00 2.10 0.20 0.00 92.50Memory (-r) :
# kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit
14:30:02 2000000 7500000 8000000 50.00 500000 6000000 10000000 62.50Disk I/O (-d) :
# DEV tps rkB/s wkB/s areq‑sz aqu‑sz await svctm %util
dev8-0 200.00 2560.00 10240.00 64.00 1.50 7.50 4.00 80.00Network (-n DEV) :
# IFACE rxpck/s txpck/s rxkB/s txkB/s
eth0 1000.00 500.00 500.00 200.00Fault‑diagnosis Scenario – Post‑incident analysis :
# assume issue between 02:00‑03:00
sar -u -s 02:00:00 -e 03:00:00
sar -r -s 02:00:00 -e 03:00:00
sar -d -s 02:00:00 -e 03:00:00
sar -n DEV -s 02:00:00 -e 03:00:00
# cross‑compare to pinpoint root cause (e.g., high iowait + %util)7. vmstat – Virtual Memory Statistics
Basic Usage :
# refresh every 2 s
vmstat 2
# 5 iterations then exit
vmstat 2 5
# detailed memory statistics
vmstat -s
# disk statistics
vmstat -d
# active/inactive memory
vmstat -a 2Output Overview (first line shows processes, memory, swap, I/O, system, CPU):
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 200000 50000 600000 0 0 5 10 100 150 5 2 92 1 0Key Fields : r: runnable processes ( > CPU cores → CPU bottleneck ) b: blocked processes ( > 2 → I/O bottleneck ) swpd: used swap (KB) free, buff, cache: memory breakdown si / so: swap in/out ( >0 indicates memory pressure ) bi / bo: block I/O reads/writes in / cs: interrupts and context switches us, sy, id, wa, st: CPU usage breakdown
Alert Thresholds : r continuously > CPU cores → CPU bottleneck b > 2 → I/O bottleneck si/so > 0 → memory shortage wa > 20 % → severe I/O wait
Tool Combination for Diagnosis
Scenario 1 – System slowdown, high load
Step 1: top for overview.
Step 2: Decide bottleneck based on %wa, %us+%sy, swap usage.
Step 3: If I/O bottleneck, use iostat then iotop for process‑level view.
Step 4: If CPU bottleneck, use top → pidstat for per‑process/thread stats.
Step 5: If memory shortage, use free, top (M sort), vmstat (si/so).
Scenario 2 – Database slow query
1. Verify CPU/I/O with pidstat -urd -p $(pgrep mysqld) 1.
2. Inspect process I/O via iotop -P -p $(pgrep mysqld).
3. Review MySQL slow‑query log.
Monitoring & Alerting
Prometheus + node_exporter
Key PromQL :
# CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Disk I/O usage
rate(node_disk_io_time_seconds_total[5m]) * 100
# Swap usage
(1 - node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) * 100Best Practices
Establish performance baseline (collect sar for 7 days).
Layered troubleshooting: system → disk → process.
Historical analysis: retain sysstat data for 30 days.
Automated alerts: Prometheus + Alertmanager.
Performance testing: compare before/after changes.
Documentation: record common issues and steps.
Tool combination: single tool rarely isolates root cause.
Regular cleanup: archive sar data to avoid disk exhaustion.
Permission control: configure sudo for iotop and other privileged tools.
Learn kernel internals: CPU scheduling, memory management, I/O stack.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
