Operations 19 min read

Master Linux Server Health: Essential Monitoring Commands Explained

Learn how to monitor Linux server performance using essential tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—understanding CPU, memory, disk I/O, and network metrics, interpreting their outputs, and applying insights to diagnose and troubleshoot system issues effectively.

MaGe Linux Operations

Mar 24, 2021

Master Linux Server Health: Essential Monitoring Commands Explained

A Linux server continuously reports a variety of parameters that are crucial for both operations staff and developers, especially when a program behaves unexpectedly; these clues help locate problems quickly.

Below are simple tools for viewing system parameters; many tools analyze data from /proc and /sys. More detailed performance monitoring and tuning may require advanced utilities such as perf or systemtap.

1. CPU and Memory

1.1 top

top

The first line shows the 1‑, 5‑, and 15‑minute average load; a value higher than the number of CPU cores indicates CPU saturation.

The second line lists task states: running (processes executing or ready), sleeping (waiting for I/O), stopped (paused, e.g., via SIGSTOP), and zombie (defunct processes awaiting parent cleanup).

The third line breaks down CPU usage:

us – user time (low nice value, high priority)

sy – system time (kernel mode, higher when I/O intensive)

ni – nice time (low priority user processes)

id – idle time

wa – iowait

hi – hardware interrupts

si – software interrupts

st – steal time (relevant for virtualized environments)

High CPU usage suggests different causes:

If us is high, a specific user process is consuming CPU; use top to identify it, then tools like perf for deeper analysis.

If sy is high, heavy I/O or kernel activity may be the culprit.

If ni is high, the process was intentionally lowered in priority.

If wa is high, I/O efficiency is poor.

If irq / softirq are high, check /proc/interrupts for hardware issues.

If st is high, the VM may be oversold.

The fourth and fifth columns display physical and virtual memory information. total = free + used + buff/cache. Buffers cache raw disk metadata; Cached caches file data. Avail Mem indicates memory available without swapping, roughly equal to free + buffers + cached. Swap usage itself isn’t critical, but frequent swap‑in/out signals memory pressure.

At the bottom, top lists each process’s resource consumption; the CPU column shows the sum across all cores. Note that running top itself consumes noticeable CPU.

1.2 vmstat

vmstat

provides another view of system load. Example output (compiled boost with -j4) shows: r – runnable processes b – uninterruptible sleep processes swpd – used virtual memory (same meaning as Swap‑used in top) bi / bo – blocks received/sent per second in – interrupts per second cs – context switches per second

When compiling kernels with different -j values, vmstat shows that context‑switch count only rises significantly with a much larger -j, indicating that the -j value itself is not a major factor.

1.3 pidstat

pidstat

offers detailed per‑process statistics, including stack usage, page faults, and thread‑level activity. Useful options: -r – shows minor ( minflt/s) and major ( majflt/s) page faults. -s – displays stack size ( StkSize) and actual usage ( StkRef). -u – CPU usage (similar to top). -w – thread context‑switch counts, split into voluntary ( cswch/s) and involuntary ( nvcswch/s). -C – filter by command name; -l – show full command line.

Example: pidstat -w -t -C "ailaw" -l For single‑process or multithreaded debugging, pidstat is often more convenient than ps.

1.4 Other CPU Tools

To monitor individual CPUs, mpstat -P ALL 1 shows per‑core load balance. Filtering top by user ( top -u username) or using ps axjf displays a detailed process tree.

2. Disk I/O

Tools like iotop show real‑time disk read/write rates per process, while lsof reveals which processes hold files or devices open (useful for unmount issues).

2.1 iostat

iostat -xz 1

(or sar -d 1) reports key disk metrics:

avgqu‑s – average queue length; >1 indicates a saturated device.

await ( r_await, w_await) – average I/O wait time (queue + service).

svctm – average service time; if close to await, the device is not I/O‑bound.

%util – device utilization; >60 % degrades performance, near 100 % means saturation.

Even if disk performance appears poor, kernel asynchronous I/O and caching may mask impact on applications.

3. Network

Network health is critical for servers. iptraf and sar -n DEV 1 provide throughput, while NIC specifications (e.g., 1 Gbps) help gauge utilization.

3.1 netstat

netstat -s

shows cumulative protocol statistics since boot; useful for checking ports and connections. Common options:

netstat --all --numeric --tcp --udp --timers --listening --program

Use netstat -antp for all TCP connections and netstat -nltp for listening sockets.

3.2 sar

sar -n TCP,ETCP 1

reports network activity, focusing on TCP and UDP:

active/s – locally initiated connections.

passive/s – remotely initiated connections.

retrans/s – TCP retransmissions (indicates loss or overload).

isegerr/s – packets with errors (e.g., checksum failures).

For UDP, sar -n UDP 1 shows noport/s (packets with no listening port) and idgmerr/s (undeliverable packets).

3.3 tcpdump

tcpdump

is a powerful command‑line packet capture tool. Capture with filters (e.g., interface, host, port) and limit file size using -C / -W. After capturing, analyze offline with Wireshark.

Example capture of Chrome establishing three connections (filtered by destination port) demonstrates clear SYN/ACK handshake.

while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done

This article was originally published on the twt Enterprise IT Community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Linux System Monitoring CLI Tools

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.