Master Linux Server Monitoring: Essential Tools & Metrics Explained
This guide walks you through essential Linux server monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explaining each metric, how to interpret CPU, memory, disk, and network statistics, and offering practical tips for troubleshooting performance bottlenecks.
Running a Linux server generates a wealth of parameter data that is crucial for both operations staff and developers, especially when programs misbehave.
Below are simple tools for viewing system parameters; many of them analyze data from /proc and /sys, while more advanced monitoring may require tools like perf or systemtap.
1. CPU and Memory
1.1 top
➜ ~ topThe first line shows the 1‑, 5‑, and 15‑minute average load; values exceeding the number of CPU cores indicate saturation.
The second line lists task states: running , sleeping , stopped , and zombie , each with specific meanings.
The third line breaks down CPU usage:
us (user): time spent in user mode with low nice values.
sy (system): time spent in kernel mode, often higher during heavy I/O.
ni (nice): time spent in user mode with positive nice values.
id (idle): idle time.
wa (iowait): time waiting for I/O completion.
hi (irq): time handling hardware interrupts.
si (softirq): time handling software interrupts.
st (steal): time lost to the hypervisor in virtualized environments.
High CPU usage hints at specific issues:
If us is high, a user process is consuming CPU; locate it with top or perf.
If sy is high, heavy I/O or kernel problems may be the cause.
If ni is high, the process was intentionally nicened.
If wa is high, I/O efficiency is low.
If irq/softirq is high, hardware may be misbehaving.
If st is high, the VM is likely oversold.
The fourth and fifth lines show physical and virtual memory details. total = free + used + buff/cache; Buffers cache raw disk metadata, while Cached caches file data. Avail Mem indicates memory available without swapping.
Note that top itself consumes resources and is best for real‑time, short‑term monitoring.
1.2 vmstat
vmstatprovides another view of system load.
Key fields: r (runnable processes), b (uninterruptible sleep), swpd (used swap), bi/bo (blocks I/O), in (interrupts per second), cs (context switches per second).
When compiling with -j, context switches only increase noticeably after a high -j value, suggesting the parameter need not be overly aggressive.
1.3 pidstat
pidstat -t -C "ailaw" -lProvides per‑process statistics, including:
-r : page faults (minor minflt/s and major majflt/s).
-s : stack usage ( StkSize and StkRef).
-u : CPU usage.
-w : thread context switches ( cswch/s and nvcswch/s).
Using -C filters by command name, making pidstat more convenient than ps for multithreaded processes.
1.4 Other CPU tools
For per‑CPU monitoring, mpstat -P ALL 1 shows load distribution across cores.
To filter processes by user: top -u taozj or custom ps formats, e.g.:
while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done
Process tree can be displayed with ps axjf.
2. Disk I/O
iotopvisualizes per‑process disk read/write rates; lsof shows which processes have files open, useful for diagnosing unmount issues.
2.1 iostat
iostat -xz 1Key metrics:
avgqu-s : average queue length; >1 indicates saturation for a single disk.
await (r_await, w_await): average I/O request latency.
svctm : average service time; close to await means low wait time.
%util : device utilization; >60% degrades performance, near 100% means saturation.
These metrics also apply to network file systems.
3. Network
Network performance is critical; tools like iptraf, sar -n DEV 1, and netstat help monitor throughput, packet loss, and retransmissions.
3.1 netstat
netstat -sshows protocol statistics since boot; useful for checking ports and connections with options such as -antp (TCP) and -nltp (listening sockets).
3.2 sar
sar -n TCP,ETCP 1reports TCP activity, including active/s , passive/s , retrans/s , and isegerr/s . For UDP, sar -n UDP 1 shows noport/s and idgmerr/s , indicating packets without listeners or other errors.
3.3 tcpdump
tcpdumpcaptures packets for offline analysis with Wireshark; use -C / -W to limit file size and rotate captures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
