Operations 17 min read

Master Linux System Monitoring: Top, vmstat, pidstat, iostat & More

This guide explains essential Linux monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—detailing the metrics they expose, how to interpret CPU, memory, disk, and network statistics, and practical command examples for effective server performance troubleshooting.

MaGe Linux Operations

Jul 19, 2023

Master Linux System Monitoring: Top, vmstat, pidstat, iostat & More

Linux servers expose a wealth of runtime parameters that are crucial for operations staff, system administrators, and developers when diagnosing abnormal program behavior.

1. CPU and Memory

1.1 top

➜ ~ top

The first line shows the 1, 5, and 15‑minute load averages; values exceeding the number of CPU cores indicate CPU saturation.

The second line lists task states: running , sleeping (interruptible/uninterruptible), stopped , and zombie processes.

The third line breaks down CPU usage types:

us (user): time spent in user space with low nice values.

sy (system): time spent in kernel space, often high on I/O‑intensive workloads.

ni (nice): time spent in user space with high nice values.

id (idle): idle time.

wa (iowait): time waiting for I/O completion.

hi (irq): time handling hardware interrupts.

si (softirq): time handling soft interrupts.

st (steal): time a virtual CPU is waiting for the hypervisor, useful for detecting oversold VPS resources.

High CPU usage suggests specific investigation paths: high us points to CPU‑bound processes; high sy may indicate heavy I/O; high ni reflects intentional nice adjustments; high wa signals I/O bottlenecks; high irq / softirq may reveal hardware issues; high st can expose hypervisor over‑commit.

The fourth and fifth lines report physical and virtual memory: total = free + used + buff/cache. Buffers cache raw disk metadata; Cached stores file data. Avail Mem indicates memory available without swapping, roughly equal to free + buffers + cached. Frequent swap activity signals memory pressure.

Note that top itself consumes CPU and reads /proc, so it appears near the top of its own output.

1.2 vmstat

vmstat

provides a compact view of system activity. Example output shows runnable processes (r), uninterruptible sleep (b), swapped memory (swpd), buffers, cached memory, I/O blocks (bi/bo), interrupts (in), and context switches (cs).

When compiling large projects with -j, increasing the job count only raises context switches after a certain threshold, indicating that aggressive parallelism does not always degrade performance.

1.3 pidstat

pidstat

offers per‑process statistics. Useful options include: -t: show threads. -r: page faults and memory usage (minor minflt/s vs. major majflt/s). -s: stack usage (StkSize, StkRef). -u: CPU usage. -w: thread context switches (cswch/s, nvcswch/s).

Filtering by command name with -C and showing full arguments with -l simplifies monitoring specific programs.

1.4 Other Tools

For per‑CPU analysis, mpstat -P ALL 1 shows load distribution across cores. Filtering top by user ( top -u username) or using ps with custom fields can isolate processes of interest. The ps axjf command displays a detailed process tree.

2. Disk I/O

iotop

visualizes per‑process disk read/write rates; lsof reveals which processes hold files or devices open, useful for troubleshooting unmountable partitions.

2.1 iostat

Running iostat -xz 1 (or sar -d 1) highlights key metrics:

avgqu-s : average queue length; values >1 suggest device saturation.

await (r_await, w_await): average I/O wait time in ms.

svctm : average service time; close to await indicates low queueing.

%util : device utilization; >60% degrades performance, approaching 100% means saturation.

These metrics also apply to network file systems.

3. Network

Network performance is critical; tools like iptraf, sar -n DEV, and netstat provide throughput and connection details.

3.1 netstat

netstat -s

shows cumulative protocol statistics since boot. For active monitoring, use options such as -antp (all TCP connections) and -nltp (listening TCP sockets).

3.2 sar

sar -n TCP,ETCP 1

reports TCP metrics: active/s (outgoing connections), passive/s (incoming connections), retrans/s (retransmissions), isegerr/s (receive errors). For UDP, sar -n UDP 1 shows noport/s (datagrams without a listening port) and idgmerr/s (undeliverable datagrams).

3.3 tcpdump

tcpdump

captures packets for offline analysis with Wireshark. Use filters to limit capture size ( -C, -W) and specify interfaces, hosts, ports, or protocols. Captured packets include timestamps and can be rotated to avoid excessive storage impact.

By combining these tools, administrators can quickly locate performance bottlenecks across CPU, memory, disk, and network layers, enabling effective troubleshooting of Linux servers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Operations Linux System Monitoring top vmstat pidstat

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.