Essential Linux Server Monitoring Tools and How to Interpret Their Metrics
This article introduces key Linux monitoring utilities—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explains the meaning of their output fields, and shows how to use them to diagnose CPU, memory, disk, and network performance issues on production servers.
Linux servers constantly expose a wealth of performance parameters that are crucial for both operations engineers and developers when troubleshooting abnormal program behavior.
Simple command‑line tools read data from
/procand
/systo present these metrics; more advanced analysis may require specialized utilities such as
perfor
systemtap.
CPU and Memory Monitoring
top
topdisplays load averages, task states, and per‑CPU usage. The first line shows 1‑, 5‑, and 15‑minute load averages; values exceeding the number of CPU cores indicate saturation. The second line lists task counts (running, sleeping, stopped, zombie). Subsequent columns break down CPU time into user (us), system (sy), nice (ni), idle (id), iowait (wa), irq (hi), softirq (si), and steal (st). High values in each column suggest specific investigation paths, such as locating CPU‑intensive processes, checking I/O‑bound kernel activity, or detecting hypervisor over‑commit.
The fourth and fifth lines report physical and virtual memory.
total = free + used + buff/cache;
Bufferscache raw disk metadata, while
Cachedstores file data.
Avail Memindicates memory readily usable without swapping. Frequent swap activity signals memory pressure.
Note that
topitself consumes resources and is best for short‑term, interactive monitoring.
vmstat
vmstatprovides a snapshot of processes, memory, paging, block I/O, traps, and CPU activity. Columns include runnable processes (r), uninterruptible sleep (b), swapped memory (swpd), buffers, cached, block I/O (bi/bo), interrupts (in), and context switches (cs).
Experiments with different
-jvalues when compiling show that context‑switch rates remain stable until the parallelism level is pushed high enough to cause noticeable increases.
pidstat
pidstatoffers per‑process statistics, including page faults (
minflt/sminor,
majflt/smajor), stack usage, CPU usage, and thread‑level context switches. Options such as
-t(thread view),
-r(memory),
-s(stack),
-u(CPU), and
-w(context switches) make it ideal for deep analysis of individual or multithreaded programs.
Other CPU Tools
For per‑CPU inspection on SMP systems,
mpstat -P ALL 1shows load distribution across cores. Filtering
topby user (
top -u username) or using
pswith custom columns can isolate specific processes, and
ps axjfvisualizes process trees.
Disk I/O Monitoring
iotopvisualizes real‑time disk read/write rates per process.
lsofreveals which processes hold files or devices open, useful for diagnosing unmount failures.
iostat -xz 1reports key disk metrics: average queue length (
avgqu-sz), average request latency (
await), service time (
svctm), and utilization (
%util). Values >1 for
avgqu-szor >60% for
%utilindicate potential saturation.
These metrics also apply to network file systems, though kernel I/O caching can mask some performance impacts.
Network Monitoring
Network health is critical for servers.
iptrafand
sar -n DEV 1show interface throughput and utilization.
netstat
netstat -sdisplays cumulative protocol statistics since boot;
netstat -antplists active TCP connections, while
netstat -nltpshows listening sockets.
sar
Using
sar -n TCP,ETCP 1provides per‑second TCP metrics such as active opens, passive opens, retransmissions, and input errors. For UDP,
sar -n UDP 1reports packets received on closed ports and input errors, helping assess reliability.
tcpdump
tcpdumpcaptures raw packets for offline analysis with Wireshark. It supports size‑based rotation (
-C/
-W) and extensive filtering (interface, host, port, protocol). Captured packets include timestamps, enabling precise reconstruction of connection sequences, though the tool adds overhead that must be considered in production.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.