Master Linux Server Monitoring: Essential Tools & Metrics Explained
This guide walks you through essential Linux server monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explaining each metric, how to interpret CPU, memory, disk, and network statistics, and offering practical tips for troubleshooting performance bottlenecks.
Running a Linux server generates a wealth of parameter data that is crucial for both operations staff and developers, especially when programs misbehave.
Below are simple tools for viewing system parameters; many of them analyze data from
/procand
/sys, while more advanced monitoring may require tools like
perfor
systemtap.
1. CPU and Memory
1.1 top
➜ ~ topThe first line shows the 1‑, 5‑, and 15‑minute average load; values exceeding the number of CPU cores indicate saturation.
The second line lists task states: running , sleeping , stopped , and zombie , each with specific meanings.
The third line breaks down CPU usage:
us (user): time spent in user mode with low nice values.
sy (system): time spent in kernel mode, often higher during heavy I/O.
ni (nice): time spent in user mode with positive nice values.
id (idle): idle time.
wa (iowait): time waiting for I/O completion.
hi (irq): time handling hardware interrupts.
si (softirq): time handling software interrupts.
st (steal): time lost to the hypervisor in virtualized environments.
High CPU usage hints at specific issues:
If us is high, a user process is consuming CPU; locate it with
topor
perf.
If sy is high, heavy I/O or kernel problems may be the cause.
If ni is high, the process was intentionally nicened.
If wa is high, I/O efficiency is low.
If irq/softirq is high, hardware may be misbehaving.
If st is high, the VM is likely oversold.
The fourth and fifth lines show physical and virtual memory details.
total = free + used + buff/cache;
Bufferscache raw disk metadata, while
Cachedcaches file data.
Avail Memindicates memory available without swapping.
Note that
topitself consumes resources and is best for real‑time, short‑term monitoring.
1.2 vmstat
vmstatprovides another view of system load.
Key fields: r (runnable processes), b (uninterruptible sleep), swpd (used swap), bi/bo (blocks I/O), in (interrupts per second), cs (context switches per second).
When compiling with
-j, context switches only increase noticeably after a high
-jvalue, suggesting the parameter need not be overly aggressive.
1.3 pidstat
pidstat -t -C "ailaw" -lProvides per‑process statistics, including:
-r : page faults (minor
minflt/sand major
majflt/s).
-s : stack usage (
StkSizeand
StkRef).
-u : CPU usage.
-w : thread context switches (
cswch/sand
nvcswch/s).
Using
-Cfilters by command name, making pidstat more convenient than
psfor multithreaded processes.
1.4 Other CPU tools
For per‑CPU monitoring,
mpstat -P ALL 1shows load distribution across cores.
To filter processes by user:
top -u taozjor custom
psformats, e.g.:
while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done
Process tree can be displayed with
ps axjf.
2. Disk I/O
iotopvisualizes per‑process disk read/write rates;
lsofshows which processes have files open, useful for diagnosing unmount issues.
2.1 iostat
iostat -xz 1Key metrics:
avgqu-s : average queue length; >1 indicates saturation for a single disk.
await (r_await, w_await): average I/O request latency.
svctm : average service time; close to await means low wait time.
%util : device utilization; >60% degrades performance, near 100% means saturation.
These metrics also apply to network file systems.
3. Network
Network performance is critical; tools like
iptraf,
sar -n DEV 1, and
netstathelp monitor throughput, packet loss, and retransmissions.
3.1 netstat
netstat -sshows protocol statistics since boot; useful for checking ports and connections with options such as
-antp(TCP) and
-nltp(listening sockets).
3.2 sar
sar -n TCP,ETCP 1reports TCP activity, including active/s , passive/s , retrans/s , and isegerr/s . For UDP,
sar -n UDP 1shows noport/s and idgmerr/s , indicating packets without listeners or other errors.
3.3 tcpdump
tcpdumpcaptures packets for offline analysis with Wireshark; use
-C/
-Wto limit file size and rotate captures.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.