Master Linux Server Health: Essential Monitoring Commands Explained
Learn how to monitor Linux server performance using essential tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—understanding CPU, memory, disk I/O, and network metrics, interpreting their outputs, and applying insights to diagnose and troubleshoot system issues effectively.
A Linux server continuously reports a variety of parameters that are crucial for both operations staff and developers, especially when a program behaves unexpectedly; these clues help locate problems quickly.
Below are simple tools for viewing system parameters; many tools analyze data from /proc and /sys. More detailed performance monitoring and tuning may require advanced utilities such as perf or systemtap.
1. CPU and Memory
1.1 top
topThe first line shows the 1‑, 5‑, and 15‑minute average load; a value higher than the number of CPU cores indicates CPU saturation.
The second line lists task states: running (processes executing or ready), sleeping (waiting for I/O), stopped (paused, e.g., via SIGSTOP), and zombie (defunct processes awaiting parent cleanup).
The third line breaks down CPU usage:
us – user time (low nice value, high priority)
sy – system time (kernel mode, higher when I/O intensive)
ni – nice time (low priority user processes)
id – idle time
wa – iowait
hi – hardware interrupts
si – software interrupts
st – steal time (relevant for virtualized environments)
High CPU usage suggests different causes:
If us is high, a specific user process is consuming CPU; use top to identify it, then tools like perf for deeper analysis.
If sy is high, heavy I/O or kernel activity may be the culprit.
If ni is high, the process was intentionally lowered in priority.
If wa is high, I/O efficiency is poor.
If irq / softirq are high, check /proc/interrupts for hardware issues.
If st is high, the VM may be oversold.
The fourth and fifth columns display physical and virtual memory information. total = free + used + buff/cache. Buffers cache raw disk metadata; Cached caches file data. Avail Mem indicates memory available without swapping, roughly equal to free + buffers + cached. Swap usage itself isn’t critical, but frequent swap‑in/out signals memory pressure.
At the bottom, top lists each process’s resource consumption; the CPU column shows the sum across all cores. Note that running top itself consumes noticeable CPU.
1.2 vmstat
vmstatprovides another view of system load. Example output (compiled boost with -j4) shows: r – runnable processes b – uninterruptible sleep processes swpd – used virtual memory (same meaning as Swap‑used in top) bi / bo – blocks received/sent per second in – interrupts per second cs – context switches per second
When compiling kernels with different -j values, vmstat shows that context‑switch count only rises significantly with a much larger -j, indicating that the -j value itself is not a major factor.
1.3 pidstat
pidstatoffers detailed per‑process statistics, including stack usage, page faults, and thread‑level activity. Useful options: -r – shows minor ( minflt/s) and major ( majflt/s) page faults. -s – displays stack size ( StkSize) and actual usage ( StkRef). -u – CPU usage (similar to top). -w – thread context‑switch counts, split into voluntary ( cswch/s) and involuntary ( nvcswch/s). -C – filter by command name; -l – show full command line.
Example: pidstat -w -t -C "ailaw" -l For single‑process or multithreaded debugging, pidstat is often more convenient than ps.
1.4 Other CPU Tools
To monitor individual CPUs, mpstat -P ALL 1 shows per‑core load balance. Filtering top by user ( top -u username) or using ps axjf displays a detailed process tree.
2. Disk I/O
Tools like iotop show real‑time disk read/write rates per process, while lsof reveals which processes hold files or devices open (useful for unmount issues).
2.1 iostat
iostat -xz 1(or sar -d 1) reports key disk metrics:
avgqu‑s – average queue length; >1 indicates a saturated device.
await ( r_await, w_await) – average I/O wait time (queue + service).
svctm – average service time; if close to await, the device is not I/O‑bound.
%util – device utilization; >60 % degrades performance, near 100 % means saturation.
Even if disk performance appears poor, kernel asynchronous I/O and caching may mask impact on applications.
3. Network
Network health is critical for servers. iptraf and sar -n DEV 1 provide throughput, while NIC specifications (e.g., 1 Gbps) help gauge utilization.
3.1 netstat
netstat -sshows cumulative protocol statistics since boot; useful for checking ports and connections. Common options:
netstat --all --numeric --tcp --udp --timers --listening --programUse netstat -antp for all TCP connections and netstat -nltp for listening sockets.
3.2 sar
sar -n TCP,ETCP 1reports network activity, focusing on TCP and UDP:
active/s – locally initiated connections.
passive/s – remotely initiated connections.
retrans/s – TCP retransmissions (indicates loss or overload).
isegerr/s – packets with errors (e.g., checksum failures).
For UDP, sar -n UDP 1 shows noport/s (packets with no listening port) and idgmerr/s (undeliverable packets).
3.3 tcpdump
tcpdumpis a powerful command‑line packet capture tool. Capture with filters (e.g., interface, host, port) and limit file size using -C / -W. After capturing, analyze offline with Wireshark.
Example capture of Chrome establishing three connections (filtered by destination port) demonstrates clear SYN/ACK handshake.
while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done
This article was originally published on the twt Enterprise IT Community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
