Essential Linux Server Monitoring Tools and How to Interpret Their Metrics
This article introduces key Linux monitoring utilities—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explains the meaning of their output fields, and shows how to use them to diagnose CPU, memory, disk, and network performance issues on production servers.
Linux servers constantly expose a wealth of performance parameters that are crucial for both operations engineers and developers when troubleshooting abnormal program behavior.
Simple command‑line tools read data from /proc and /sys to present these metrics; more advanced analysis may require specialized utilities such as perf or systemtap.
CPU and Memory Monitoring
top
topdisplays load averages, task states, and per‑CPU usage. The first line shows 1‑, 5‑, and 15‑minute load averages; values exceeding the number of CPU cores indicate saturation. The second line lists task counts (running, sleeping, stopped, zombie). Subsequent columns break down CPU time into user (us), system (sy), nice (ni), idle (id), iowait (wa), irq (hi), softirq (si), and steal (st). High values in each column suggest specific investigation paths, such as locating CPU‑intensive processes, checking I/O‑bound kernel activity, or detecting hypervisor over‑commit.
The fourth and fifth lines report physical and virtual memory. total = free + used + buff/cache; Buffers cache raw disk metadata, while Cached stores file data. Avail Mem indicates memory readily usable without swapping. Frequent swap activity signals memory pressure.
Note that top itself consumes resources and is best for short‑term, interactive monitoring.
vmstat
vmstatprovides a snapshot of processes, memory, paging, block I/O, traps, and CPU activity. Columns include runnable processes (r), uninterruptible sleep (b), swapped memory (swpd), buffers, cached, block I/O (bi/bo), interrupts (in), and context switches (cs).
Experiments with different -j values when compiling show that context‑switch rates remain stable until the parallelism level is pushed high enough to cause noticeable increases.
pidstat
pidstatoffers per‑process statistics, including page faults ( minflt/s minor, majflt/s major), stack usage, CPU usage, and thread‑level context switches. Options such as -t (thread view), -r (memory), -s (stack), -u (CPU), and -w (context switches) make it ideal for deep analysis of individual or multithreaded programs.
Other CPU Tools
For per‑CPU inspection on SMP systems, mpstat -P ALL 1 shows load distribution across cores. Filtering top by user ( top -u username) or using ps with custom columns can isolate specific processes, and ps axjf visualizes process trees.
Disk I/O Monitoring
iotopvisualizes real‑time disk read/write rates per process. lsof reveals which processes hold files or devices open, useful for diagnosing unmount failures. iostat -xz 1 reports key disk metrics: average queue length ( avgqu-sz), average request latency ( await), service time ( svctm), and utilization ( %util). Values >1 for avgqu-sz or >60% for %util indicate potential saturation.
These metrics also apply to network file systems, though kernel I/O caching can mask some performance impacts.
Network Monitoring
Network health is critical for servers. iptraf and sar -n DEV 1 show interface throughput and utilization.
netstat
netstat -sdisplays cumulative protocol statistics since boot; netstat -antp lists active TCP connections, while netstat -nltp shows listening sockets.
sar
Using sar -n TCP,ETCP 1 provides per‑second TCP metrics such as active opens, passive opens, retransmissions, and input errors. For UDP, sar -n UDP 1 reports packets received on closed ports and input errors, helping assess reliability.
tcpdump
tcpdumpcaptures raw packets for offline analysis with Wireshark. It supports size‑based rotation ( -C / -W) and extensive filtering (interface, host, port, protocol). Captured packets include timestamps, enabling precise reconstruction of connection sequences, though the tool adds overhead that must be considered in production.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
