Master Linux System Monitoring: Top, vmstat, pidstat, iostat & More
This guide explains essential Linux monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—detailing the metrics they expose, how to interpret CPU, memory, disk, and network statistics, and practical command examples for effective server performance troubleshooting.
Linux servers expose a wealth of runtime parameters that are crucial for operations staff, system administrators, and developers when diagnosing abnormal program behavior.
1. CPU and Memory
1.1 top
➜ ~ topThe first line shows the 1, 5, and 15‑minute load averages; values exceeding the number of CPU cores indicate CPU saturation.
The second line lists task states: running , sleeping (interruptible/uninterruptible), stopped , and zombie processes.
The third line breaks down CPU usage types:
us (user): time spent in user space with low nice values.
sy (system): time spent in kernel space, often high on I/O‑intensive workloads.
ni (nice): time spent in user space with high nice values.
id (idle): idle time.
wa (iowait): time waiting for I/O completion.
hi (irq): time handling hardware interrupts.
si (softirq): time handling soft interrupts.
st (steal): time a virtual CPU is waiting for the hypervisor, useful for detecting oversold VPS resources.
High CPU usage suggests specific investigation paths: high us points to CPU‑bound processes; high sy may indicate heavy I/O; high ni reflects intentional nice adjustments; high wa signals I/O bottlenecks; high irq / softirq may reveal hardware issues; high st can expose hypervisor over‑commit.
The fourth and fifth lines report physical and virtual memory: total = free + used + buff/cache. Buffers cache raw disk metadata; Cached stores file data. Avail Mem indicates memory available without swapping, roughly equal to free + buffers + cached. Frequent swap activity signals memory pressure.
Note that top itself consumes CPU and reads /proc, so it appears near the top of its own output.
1.2 vmstat
vmstatprovides a compact view of system activity. Example output shows runnable processes (r), uninterruptible sleep (b), swapped memory (swpd), buffers, cached memory, I/O blocks (bi/bo), interrupts (in), and context switches (cs).
When compiling large projects with -j, increasing the job count only raises context switches after a certain threshold, indicating that aggressive parallelism does not always degrade performance.
1.3 pidstat
pidstatoffers per‑process statistics. Useful options include: -t: show threads. -r: page faults and memory usage (minor minflt/s vs. major majflt/s). -s: stack usage (StkSize, StkRef). -u: CPU usage. -w: thread context switches (cswch/s, nvcswch/s).
Filtering by command name with -C and showing full arguments with -l simplifies monitoring specific programs.
1.4 Other Tools
For per‑CPU analysis, mpstat -P ALL 1 shows load distribution across cores. Filtering top by user ( top -u username) or using ps with custom fields can isolate processes of interest. The ps axjf command displays a detailed process tree.
2. Disk I/O
iotopvisualizes per‑process disk read/write rates; lsof reveals which processes hold files or devices open, useful for troubleshooting unmountable partitions.
2.1 iostat
Running iostat -xz 1 (or sar -d 1) highlights key metrics:
avgqu-s : average queue length; values >1 suggest device saturation.
await (r_await, w_await): average I/O wait time in ms.
svctm : average service time; close to await indicates low queueing.
%util : device utilization; >60% degrades performance, approaching 100% means saturation.
These metrics also apply to network file systems.
3. Network
Network performance is critical; tools like iptraf, sar -n DEV, and netstat provide throughput and connection details.
3.1 netstat
netstat -sshows cumulative protocol statistics since boot. For active monitoring, use options such as -antp (all TCP connections) and -nltp (listening TCP sockets).
3.2 sar
sar -n TCP,ETCP 1reports TCP metrics: active/s (outgoing connections), passive/s (incoming connections), retrans/s (retransmissions), isegerr/s (receive errors). For UDP, sar -n UDP 1 shows noport/s (datagrams without a listening port) and idgmerr/s (undeliverable datagrams).
3.3 tcpdump
tcpdumpcaptures packets for offline analysis with Wireshark. Use filters to limit capture size ( -C, -W) and specify interfaces, hosts, ports, or protocols. Captured packets include timestamps and can be rotated to avoid excessive storage impact.
By combining these tools, administrators can quickly locate performance bottlenecks across CPU, memory, disk, and network layers, enabling effective troubleshooting of Linux servers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
