Master Linux Server Monitoring: Essential Tools & Metrics Explained
This guide walks through essential Linux server monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explaining their output fields, what each metric reveals about CPU, memory, disk I/O, and network performance, and how to use them for effective troubleshooting and capacity planning.
Preface
A Linux server constantly produces a wealth of parameter data that is crucial for operations staff and system administrators, and also valuable for developers when troubleshooting abnormal program behavior.
Introduction
This article lists simple tools for viewing system parameters; many of them parse data from /proc and /sys, while more advanced performance monitoring and tuning may require specialized tools such as perf or systemtap.
1. CPU and Memory
1.1 top
Command:
topThe first line shows the 1‑, 5‑, and 15‑minute load averages; values exceeding the number of CPU cores indicate CPU saturation.
The second line reports task states: running, sleeping (interruptible/uninterruptible), stopped, zombie, etc.
The third line breaks down CPU usage into user (us), system (sy), nice (ni), idle (id), iowait (wa), irq, softirq, and steal (st) percentages, each with a brief explanation.
High percentages suggest specific investigation paths, e.g., high us points to a CPU‑intensive process, high sy may indicate heavy I/O, high wa signals I/O bottlenecks, and high st can reveal VM over‑commitment.
The fourth and fifth lines display physical and virtual memory information. total = free + used + buff/cache. Buffers cache raw disk metadata, while Cached stores file data. Avail Mem indicates memory available without swapping.
Swap usage is not inherently bad, but frequent swap‑in/out suggests memory pressure.
Finally, the process list shows per‑process resource consumption; note that running top itself consumes CPU.
1.2 vmstat
vmstatprovides another view of system load. Columns include r (runnable processes), b (uninterruptible sleep), swpd (used swap), buffers, cached, bi/bo (blocks I/O), in (interrupts per second), cs (context switches).
Example output shows that increasing the -j compile parallelism does not significantly affect context‑switch count until a high value is used.
1.3 pidstat
pidstatoffers detailed per‑process statistics, including page faults, stack usage, CPU usage, and thread‑level context switches. Useful options:
-r: page faults (minor minflt/s, major majflt/s)
-s: stack size ( StkSize) and usage ( StkRef)
-u: CPU usage
-w: thread context switches ( cswch/s, nvcswch/s)
-C pattern -l: filter by command name and show full command line
Example:
pidstat -w -t -C "ailaw" -l1.4 Other CPU tools
For per‑CPU monitoring, mpstat -P ALL 1 shows load distribution across cores.
To filter processes by user: top -u taozj or use ps with custom format, e.g.
while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; doneProcess tree can be displayed with ps axjf.
2. Disk I/O
2.1 iostat
Command: iostat -xz 1. Key metrics:
avgqu-s: average queue length; >1 indicates saturation.
await (r_await, w_await): average I/O wait time.
svctm: average service time.
%util: device utilization; >60% may degrade performance.
Even if I/O appears slow, kernel asynchronous I/O and caching can mask impact on applications.
3. Network
3.1 netstat
Show protocol statistics since boot: netstat -s. For active connections use:
netstat --all --numeric --tcp --udp --timers --listening --programCommon shortcuts: netstat -antp (all TCP), netstat -nltp (listening TCP).
3.2 sar
sar -n TCP,ETCP 1reports TCP activity (active/s, passive/s, retrans/s, isegerr/s). sar -n UDP 1 reports UDP metrics (noport/s, idgmerr/s).
3.3 tcpdump
tcpdumpcaptures packets for offline analysis with Wireshark. Use filters to limit capture size ( -C, -W) and reduce performance impact.
When capturing, configure filters carefully to avoid excessive load on the production system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
