Operations 29 min read

Master Linux Performance: Top, iotop, pidstat, sar – Real‑World Diagnostic Guide

This guide covers Linux performance analysis tools—including top, htop, iotop, pidstat, iostat, sar, and vmstat—detailing installation, usage, key metrics, troubleshooting scenarios, monitoring integration with Prometheus, and best‑practice recommendations for effective system diagnostics and capacity planning.

Ops Community
Ops Community
Ops Community
Master Linux Performance: Top, iotop, pidstat, sar – Real‑World Diagnostic Guide

Linux Performance Analysis Toolset: top, iotop, pidstat, sar Practical Diagnostic Manual

Applicable Scenarios & Prerequisites

Applicable Scenarios : CPU high load investigation, memory leak diagnosis, I/O bottleneck location, performance baseline establishment, capacity planning.

Prerequisites :

OS: RHEL/CentOS 7.x‑9.x, Ubuntu 18.04‑24.04

Tool package: sysstat (provides iostat/pidstat/sar), iotop

Permissions: some tools require root (e.g., iotop)

Kernel: CONFIG_TASK_DELAY_ACCT=y (required by iotop)

Environment and Version Matrix

Tool

Package

Purpose

Applicable Scenarios

top

procps-ng

Real‑time process monitoring

CPU/Memory usage investigation

htop

htop

Enhanced top

Interactive multi‑core CPU monitoring

iotop

iotop

I/O monitoring

Disk I/O bottleneck location

pidstat

sysstat

Process statistics

Single‑process CPU/Memory/I/O analysis

iostat

sysstat

Disk I/O statistics

Disk performance analysis

sar

sysstat

Historical performance data

Trend analysis, capacity planning

vmstat

procps-ng

Virtual memory statistics

Memory/Swap/CPU comprehensive monitoring

Quick Checklist

Install performance analysis toolset

Enable sysstat data collection (sar historical data)

Use top to diagnose high‑CPU processes

Use iotop to locate I/O‑intensive processes

Use pidstat to analyze single‑process performance

Use iostat to diagnose disk bottlenecks

Use sar to analyze historical performance trends

Establish performance baseline and alert thresholds

Combine tools to troubleshoot complex issues

Export performance data for long‑term analysis

Tool Details

1. top – Real‑time Process Monitoring

Installation : Comes with procps‑ng package.

Basic Usage :

# start top
top

# common hotkeys (press during run)
P  # sort by CPU% (default)
M  # sort by MEM%
T  # sort by runtime
k  # kill process (enter PID)
r  # renice process
1  # show all CPU cores separately
c  # show full command line
V  # tree view of process hierarchy
f  # select displayed fields
W  # save configuration
q  # quit

Output Explanation :

First line (system summary) : load average: 0.50, 0.55, 0.58: 1/5/15‑minute load average

< CPU cores: normal

= CPU cores: full load

> 1.5 × CPU cores: overload

Third line (CPU statistics) : us (user): user‑space CPU % sy (system): kernel‑space CPU % ni (nice): CPU % after priority adjustment id (idle): idle CPU % wa (iowait): I/O wait CPU % ( >20 % indicates I/O bottleneck ) hi/si: hardware/software interrupt CPU % st (steal): CPU stolen by hypervisor

Fourth/Fifth lines (memory statistics) : total: total memory free: completely free memory used: used memory buff/cache: kernel cache (reclaimable) avail Mem: actual available memory (including cache)

Process list fields : VIRT: virtual memory (requested total) RES: resident memory (actual physical usage) SHR: shared memory S: process state (R=running, S=sleeping, D=uninterruptible, Z=zombie) %CPU: CPU usage (may exceed 100 % for multithreaded processes) %MEM: memory usage percentage

Advanced Usage :

# monitor specific user
top -u nginx

# batch mode (output to file)
 top -b -n 1 > top-output.txt

# set refresh interval (2 s)
 top -d 2

# show specific PIDs
 top -p 1234,5678

# show threads
 top -H

Fault‑diagnosis Scenarios :

Scenario 1 – CPU high load: launch top, sort by P, check %wa; if >20 % use iotop for I/O investigation.

Scenario 2 – Memory shortage: sort by M, examine RES, monitor swap usage.

2. htop – Enhanced top

Installation :

# RHEL/CentOS
sudo yum install -y htop

# Ubuntu
sudo apt install -y htop

Advantages :

Colorful output for better readability

Mouse support (click to select processes)

Separate display for each CPU core

Tree view of process hierarchy

Built‑in search and filter

Common Hotkeys :

F1  # help
F2  # setup
F3  # search process
F4  # filter process
F5  # tree view
F6  # sort field selection
F9  # kill process
F10 # quit

3. iotop – I/O Monitoring Tool

Installation :

# RHEL/CentOS
sudo yum install -y iotop

# Ubuntu
sudo apt install -y iotop

Basic Usage :

# start iotop (requires root)
sudo iotop

# show only processes with I/O activity
sudo iotop -o

# hide threads, show only processes
sudo iotop -P

# set refresh interval (3 s)
sudo iotop -d 3

# batch mode (output to file)
sudo iotop -b -n 3 > iotop-output.txt

Output Details :

Total DISK READ: 10.50 M/s | Total DISK WRITE: 25.00 M/s
 TID  PRIO USER   DISK READ  DISK WRITE  SWAPIN   IO>   COMMAND
1234 be/4 mysql   5.00 M/s   15.00 M/s   0.00 %  90.00 % mysqld
5678 be/4 www     2.00 M/s    5.00 M/s   0.00 %  50.00 % nginx worker

Key Fields : DISK READ/WRITE: per‑second read/write speed SWAPIN: swap usage percentage IO>: I/O wait percentage (similar to top’s wa) PRIO: I/O priority (be=best effort, rt=real‑time, idle)

Common Hotkeys :

o  # toggle display of only I/O‑active processes
p  # toggle process/thread view
a  # accumulated mode (total I/O instead of rate)
q  # quit

Fault‑diagnosis Scenario – Disk I/O high :

# start iotop
sudo iotop -o

# examine DISK READ/WRITE columns; sustained high values may indicate
#   • slow database queries
#   • log explosion
#   • backup jobs
# then correlate with iostat for device‑level I/O.

4. pidstat – Process Performance Statistics

Installation :

# RHEL/CentOS
sudo yum install -y sysstat

# Ubuntu
sudo apt install -y sysstat

Basic Usage :

# all processes CPU stats, refresh every 2 s
pidstat 2

# specific PID
pidstat -p 1234 2

# memory statistics
pidstat -r 2

# I/O statistics
pidstat -d 2

# thread statistics
pidstat -t 2

# context‑switch statistics
pidstat -w 2

# combined CPU+memory+I/O
pidstat -urd 2

CPU Statistics Output :

14:30:00   UID   PID   %usr %system %guest %wait %CPU CPU Command
14:30:02    0   1234   25.00   5.00   0.00   2.00   30.00   2 mysqld
14:30:02 1000   5678   10.00   2.00   0.00   0.00   12.00   1 nginx

Key Fields : %usr: user‑space CPU % %system: kernel‑space CPU % %wait: CPU wait % (high → contention) CPU: CPU core on which the process runs

Memory Statistics (-r) :

14:30:00   UID   PID  minflt/s majflt/s   VSZ   RSS   %MEM Command
14:30:02    0   1234   100.00   0.00 2500000 1200000 7.50 mysqld

Key fields: minflt/s (minor page faults), majflt/s (major page faults), VSZ (virtual memory), RSS (resident memory), %MEM (memory %).

I/O Statistics (-d) :

14:30:00   UID   PID  kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
14:30:02    0   1234   5000.00 15000.00   0.00   50   mysqld

Key fields: kB_rd/s, kB_wr/s, iodelay (I/O latency).

Fault‑diagnosis Scenario – Single process high CPU :

# monitor the process
pidstat -p 1234 1

# if %wait high → CPU contention, consider throttling or scaling
# if %usr high → code optimisation
# drill down to threads
pidstat -t -p 1234 1

5. iostat – Disk I/O Statistics

Installation : Provided by the sysstat package.

Basic Usage :

# overall CPU and disk I/O
iostat

# refresh every 2 s
iostat 2

# extended statistics
iostat -x 2

# display in MB
iostat -xm 2

# specific device
iostat -x /dev/sda 2

# include device names (instead of numbers)
iostat -xm -p ALL 2

Output Details :

avg-cpu: %user %nice %system %iowait %steal %idle
          5.20   0.00   2.10   0.20   0.00  92.50

Device   r/s   w/s   rMB/s   wMB/s   rrqm/s   wrqm/s  %rrqm %wrqm  await  r_await  w_await  svctm %util
sda     50.00 150.00   2.50   10.00    5.00    20.00   9.09  11.76   8.50    5.00    10.00   4.00  80.00
sdb     10.00  20.00   0.50    1.00    1.00     2.00   9.09   9.09   3.00    2.00     4.00   1.50  10.00

Key Metrics : r/s, w/s: reads/writes per second rMB/s, wMB/s: MB transferred per second await: average I/O wait (ms) – <10 ms excellent, 10‑50 ms normal, >100 ms severe %util: device utilization – >80 % indicates I/O bottleneck

Fault‑diagnosis Scenario – Slow disk I/O :

# check utilization and await
iostat -xm 2

# if %util >80% and await >100 ms → consider SSD upgrade, RAID cache, or I/O scheduler tuning.
# then use iotop to locate the offending process.

6. sar – System Activity Reporter

Installation & Enable :

# install sysstat
sudo yum install -y sysstat   # RHEL/CentOS
sudo apt install -y sysstat   # Ubuntu

# enable data collection
sudo systemctl enable sysstat
sudo systemctl start sysstat
# on Ubuntu edit /etc/default/sysstat to set ENABLED="true"

Data collection interval defaults to 10 minutes (configurable in /etc/cron.d/sysstat).

Real‑time Commands :

# CPU usage (2 s interval, 10 samples)
sar -u 2 10

# Memory usage
sar -r 2 10

# Disk I/O
sar -d 2 10

# Network traffic
sar -n DEV 2 10

# Swap usage
sar -S 2 10

# Load and context switches
sar -q 2 10

Historical Data :

# today’s CPU data
sar -u

# yesterday’s data
sar -u -f /var/log/sysstat/sa$(date -d yesterday +%d)

# specific time range
sar -u -s 10:00:00 -e 12:00:00

Key Metrics – CPU (-u) :

# example output
14:30:00     CPU   %user %nice %system %iowait %steal %idle
14:30:02     all    5.20   0.00   2.10   0.20   0.00  92.50

Memory (-r) :

# kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit
14:30:02   2000000 7500000 8000000 50.00 500000 6000000 10000000 62.50

Disk I/O (-d) :

# DEV   tps   rkB/s   wkB/s   areq‑sz   aqu‑sz   await   svctm   %util
dev8-0 200.00 2560.00 10240.00 64.00 1.50 7.50 4.00 80.00

Network (-n DEV) :

# IFACE   rxpck/s   txpck/s   rxkB/s   txkB/s
eth0      1000.00   500.00   500.00   200.00

Fault‑diagnosis Scenario – Post‑incident analysis :

# assume issue between 02:00‑03:00
sar -u -s 02:00:00 -e 03:00:00
sar -r -s 02:00:00 -e 03:00:00
sar -d -s 02:00:00 -e 03:00:00
sar -n DEV -s 02:00:00 -e 03:00:00
# cross‑compare to pinpoint root cause (e.g., high iowait + %util)

7. vmstat – Virtual Memory Statistics

Basic Usage :

# refresh every 2 s
vmstat 2

# 5 iterations then exit
vmstat 2 5

# detailed memory statistics
vmstat -s

# disk statistics
vmstat -d

# active/inactive memory
vmstat -a 2

Output Overview (first line shows processes, memory, swap, I/O, system, CPU):

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so   bi   bo   in   cs us sy id wa st
 1  0      0 200000  50000 600000   0    0    5   10 100 150  5  2 92  1  0

Key Fields : r: runnable processes ( > CPU cores → CPU bottleneck ) b: blocked processes ( > 2 → I/O bottleneck ) swpd: used swap (KB) free, buff, cache: memory breakdown si / so: swap in/out ( >0 indicates memory pressure ) bi / bo: block I/O reads/writes in / cs: interrupts and context switches us, sy, id, wa, st: CPU usage breakdown

Alert Thresholds : r continuously > CPU cores → CPU bottleneck b > 2 → I/O bottleneck si/so > 0 → memory shortage wa > 20 % → severe I/O wait

Tool Combination for Diagnosis

Scenario 1 – System slowdown, high load

Step 1: top for overview.

Step 2: Decide bottleneck based on %wa, %us+%sy, swap usage.

Step 3: If I/O bottleneck, use iostat then iotop for process‑level view.

Step 4: If CPU bottleneck, use toppidstat for per‑process/thread stats.

Step 5: If memory shortage, use free, top (M sort), vmstat (si/so).

Scenario 2 – Database slow query

1. Verify CPU/I/O with pidstat -urd -p $(pgrep mysqld) 1.

2. Inspect process I/O via iotop -P -p $(pgrep mysqld).

3. Review MySQL slow‑query log.

Monitoring & Alerting

Prometheus + node_exporter

Key PromQL :

# CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk I/O usage
rate(node_disk_io_time_seconds_total[5m]) * 100

# Swap usage
(1 - node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) * 100

Best Practices

Establish performance baseline (collect sar for 7 days).

Layered troubleshooting: system → disk → process.

Historical analysis: retain sysstat data for 30 days.

Automated alerts: Prometheus + Alertmanager.

Performance testing: compare before/after changes.

Documentation: record common issues and steps.

Tool combination: single tool rarely isolates root cause.

Regular cleanup: archive sar data to avoid disk exhaustion.

Permission control: configure sudo for iotop and other privileged tools.

Learn kernel internals: CPU scheduling, memory management, I/O stack.

Performance MonitoringOpstopiotopsarpidstat
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.