Operations 18 min read

Master Linux Server Monitoring: Essential Tools & Metrics Explained

This guide walks you through essential Linux server monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explaining each metric, how to interpret CPU, memory, disk, and network statistics, and offering practical tips for troubleshooting performance bottlenecks.

Efficient Ops

Sep 16, 2019

Master Linux Server Monitoring: Essential Tools & Metrics Explained

Running a Linux server generates a wealth of parameter data that is crucial for both operations staff and developers, especially when programs misbehave.

Below are simple tools for viewing system parameters; many of them analyze data from /proc and /sys, while more advanced monitoring may require tools like perf or systemtap.

1. CPU and Memory

1.1 top

➜ ~ top

The first line shows the 1‑, 5‑, and 15‑minute average load; values exceeding the number of CPU cores indicate saturation.

The second line lists task states: running , sleeping , stopped , and zombie , each with specific meanings.

The third line breaks down CPU usage:

us (user): time spent in user mode with low nice values.

sy (system): time spent in kernel mode, often higher during heavy I/O.

ni (nice): time spent in user mode with positive nice values.

id (idle): idle time.

wa (iowait): time waiting for I/O completion.

hi (irq): time handling hardware interrupts.

si (softirq): time handling software interrupts.

st (steal): time lost to the hypervisor in virtualized environments.

High CPU usage hints at specific issues:

If us is high, a user process is consuming CPU; locate it with top or perf.

If sy is high, heavy I/O or kernel problems may be the cause.

If ni is high, the process was intentionally nicened.

If wa is high, I/O efficiency is low.

If irq/softirq is high, hardware may be misbehaving.

If st is high, the VM is likely oversold.

The fourth and fifth lines show physical and virtual memory details. total = free + used + buff/cache; Buffers cache raw disk metadata, while Cached caches file data. Avail Mem indicates memory available without swapping.

Note that top itself consumes resources and is best for real‑time, short‑term monitoring.

1.2 vmstat

vmstat

provides another view of system load.

Key fields: r (runnable processes), b (uninterruptible sleep), swpd (used swap), bi/bo (blocks I/O), in (interrupts per second), cs (context switches per second).

When compiling with -j, context switches only increase noticeably after a high -j value, suggesting the parameter need not be overly aggressive.

1.3 pidstat

pidstat -t -C "ailaw" -l

Provides per‑process statistics, including:

-r : page faults (minor minflt/s and major majflt/s).

-s : stack usage ( StkSize and StkRef).

-u : CPU usage.

-w : thread context switches ( cswch/s and nvcswch/s).

Using -C filters by command name, making pidstat more convenient than ps for multithreaded processes.

1.4 Other CPU tools

For per‑CPU monitoring, mpstat -P ALL 1 shows load distribution across cores.

To filter processes by user: top -u taozj or custom ps formats, e.g.:

while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done

Process tree can be displayed with ps axjf.

2. Disk I/O

iotop

visualizes per‑process disk read/write rates; lsof shows which processes have files open, useful for diagnosing unmount issues.

2.1 iostat

iostat -xz 1

Key metrics:

avgqu-s : average queue length; >1 indicates saturation for a single disk.

await (r_await, w_await): average I/O request latency.

svctm : average service time; close to await means low wait time.

%util : device utilization; >60% degrades performance, near 100% means saturation.

These metrics also apply to network file systems.

3. Network

Network performance is critical; tools like iptraf, sar -n DEV 1, and netstat help monitor throughput, packet loss, and retransmissions.

3.1 netstat

netstat -s

shows protocol statistics since boot; useful for checking ports and connections with options such as -antp (TCP) and -nltp (listening sockets).

3.2 sar

sar -n TCP,ETCP 1

reports TCP activity, including active/s , passive/s , retrans/s , and isegerr/s . For UDP, sar -n UDP 1 shows noport/s and idgmerr/s , indicating packets without listeners or other errors.

3.3 tcpdump

tcpdump

captures packets for offline analysis with Wireshark; use -C / -W to limit file size and rotate captures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Linux System Monitoring top iostat vmstat pidstat

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.