Master Linux Server Monitoring: Essential Tools and How to Use Them
This guide explains how to use common Linux monitoring utilities—top, vmstat, pidstat, iostat, sar, netstat, and tcpdump—to observe CPU, memory, disk I/O, and network performance, interpret their output, and troubleshoot system bottlenecks effectively.
CPU and Memory Monitoring
1.1 top
The top command shows real‑time system load, task states, and per‑CPU usage. The first line displays 1‑, 5‑, and 15‑minute load averages; values exceeding the number of CPU cores indicate saturation. The second line lists task counts (running, sleeping, stopped, zombie). Subsequent lines break down CPU time into us (user), sy (system), ni (nice), id (idle), wa (iowait), hi (hardware interrupt), si (software interrupt), and st (steal) percentages, each pointing to different performance issues. The memory section shows total, free, used, buffers, and cached values; avail Mem reflects memory available without swapping. High user usage often means a specific process is CPU‑bound, while high system suggests heavy I/O or kernel activity. Elevated iowait, irq / softirq, or steal values hint at storage bottlenecks, hardware problems, or virtual‑machine over‑commitment.
1.2 vmstat
vmstatprovides a snapshot of processes, memory, paging, block I/O, and CPU activity. Columns include r (runnable processes), b (blocked/uninterruptible), swpd (used swap), free, buff, cache, I/O blocks per second ( bi / bo), interrupts ( in), and context switches ( cs). The tool is useful for correlating compile‑time -j settings with system load; a sharp rise in cs indicates excessive context switching.
1.3 pidstat
pidstatoffers per‑process and per‑thread statistics. Useful flags include: -r: reports page faults (minor minflt/s and major majflt/s) and memory usage. -s: shows stack size ( StkSize) and actual usage ( StkRef). -u: CPU usage breakdown similar to top. -w: thread context‑switch counts, split into voluntary ( cswch/s) and involuntary ( nvcswch/s). -C "pattern" and -l: filter by command name and display full command line.
Running pidstat -w -t -C "ailaw" -l yields detailed thread‑level metrics, making it superior to ps for multi‑threaded debugging.
1.4 Other CPU Tools
For per‑CPU analysis on SMP systems, mpstat -P ALL 1 shows load distribution across cores. Filtering top by user ( top -u username) or using a custom ps loop can isolate specific processes.
Disk I/O Monitoring
2.1 iostat
iostat -xz 1reports extended disk statistics. Key fields are: avgqu-s: average queue length; values > 1 suggest device saturation. await / r_await / w_await: average I/O wait time (ms). svctm: average service time; close to await means little queuing. %util: percentage of time the device is busy; > 60 % degrades performance, approaching 100 % indicates saturation.
Even if I/O appears slow, kernel caching and asynchronous I/O may mask impact on applications; the same metrics apply to network file systems.
Network Monitoring
3.1 netstat
netstat -sdisplays cumulative protocol statistics since boot. Use netstat -antp for active TCP connections and netstat -nltp for listening sockets. Adding --timers disables reverse DNS lookups for faster output.
3.2 sar
The sar utility can monitor network activity with -n TCP,ETCP 1 for TCP and -n UDP 1 for UDP. Important counters include: active/s: outgoing connection attempts. passive/s: incoming connection attempts. retrans/s (or tcpRetransSegs): TCP retransmissions per second. isegerr/s (or tcpInErrs): received packets with errors. noport/s (or udpNoPorts): UDP datagrams received on unopened ports. idgmerr/s (or udpInErrors): other UDP receive errors.
These figures help assess network reliability when correlated with application requirements.
3.3 tcpdump
tcpdumpcaptures live packet traces. Use filters (e.g., dst port 80) to limit capture size; the -C and -W options rotate files automatically. Captured pcap files can be analyzed offline with Wireshark. Example: capturing Chrome’s three‑handshake sequence demonstrates how to isolate connection setup packets while minimizing performance impact on the host.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
