Operations 18 min read

Master Linux Server Monitoring: Essential Tools & Metrics Explained

This guide walks you through essential Linux server monitoring tools—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explaining each metric, how to interpret CPU, memory, disk, and network statistics, and offering practical tips for troubleshooting performance bottlenecks.

Efficient Ops
Efficient Ops
Efficient Ops
Master Linux Server Monitoring: Essential Tools & Metrics Explained

Running a Linux server generates a wealth of parameter data that is crucial for both operations staff and developers, especially when programs misbehave.

Below are simple tools for viewing system parameters; many of them analyze data from

/proc

and

/sys

, while more advanced monitoring may require tools like

perf

or

systemtap

.

1. CPU and Memory

1.1 top

➜ ~ top

The first line shows the 1‑, 5‑, and 15‑minute average load; values exceeding the number of CPU cores indicate saturation.

The second line lists task states: running , sleeping , stopped , and zombie , each with specific meanings.

The third line breaks down CPU usage:

us (user): time spent in user mode with low nice values.

sy (system): time spent in kernel mode, often higher during heavy I/O.

ni (nice): time spent in user mode with positive nice values.

id (idle): idle time.

wa (iowait): time waiting for I/O completion.

hi (irq): time handling hardware interrupts.

si (softirq): time handling software interrupts.

st (steal): time lost to the hypervisor in virtualized environments.

High CPU usage hints at specific issues:

If us is high, a user process is consuming CPU; locate it with

top

or

perf

.

If sy is high, heavy I/O or kernel problems may be the cause.

If ni is high, the process was intentionally nicened.

If wa is high, I/O efficiency is low.

If irq/softirq is high, hardware may be misbehaving.

If st is high, the VM is likely oversold.

The fourth and fifth lines show physical and virtual memory details.

total = free + used + buff/cache

;

Buffers

cache raw disk metadata, while

Cached

caches file data.

Avail Mem

indicates memory available without swapping.

Note that

top

itself consumes resources and is best for real‑time, short‑term monitoring.

1.2 vmstat

vmstat

provides another view of system load.

Key fields: r (runnable processes), b (uninterruptible sleep), swpd (used swap), bi/bo (blocks I/O), in (interrupts per second), cs (context switches per second).

When compiling with

-j

, context switches only increase noticeably after a high

-j

value, suggesting the parameter need not be overly aggressive.

1.3 pidstat

pidstat -t -C "ailaw" -l

Provides per‑process statistics, including:

-r : page faults (minor

minflt/s

and major

majflt/s

).

-s : stack usage (

StkSize

and

StkRef

).

-u : CPU usage.

-w : thread context switches (

cswch/s

and

nvcswch/s

).

Using

-C

filters by command name, making pidstat more convenient than

ps

for multithreaded processes.

1.4 Other CPU tools

For per‑CPU monitoring,

mpstat -P ALL 1

shows load distribution across cores.

To filter processes by user:

top -u taozj

or custom

ps

formats, e.g.:

while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep 'ailawd'; sleep 1; done

Process tree can be displayed with

ps axjf

.

2. Disk I/O

iotop

visualizes per‑process disk read/write rates;

lsof

shows which processes have files open, useful for diagnosing unmount issues.

2.1 iostat

iostat -xz 1

Key metrics:

avgqu-s : average queue length; >1 indicates saturation for a single disk.

await (r_await, w_await): average I/O request latency.

svctm : average service time; close to await means low wait time.

%util : device utilization; >60% degrades performance, near 100% means saturation.

These metrics also apply to network file systems.

3. Network

Network performance is critical; tools like

iptraf

,

sar -n DEV 1

, and

netstat

help monitor throughput, packet loss, and retransmissions.

3.1 netstat

netstat -s

shows protocol statistics since boot; useful for checking ports and connections with options such as

-antp

(TCP) and

-nltp

(listening sockets).

3.2 sar

sar -n TCP,ETCP 1

reports TCP activity, including active/s , passive/s , retrans/s , and isegerr/s . For UDP,

sar -n UDP 1

shows noport/s and idgmerr/s , indicating packets without listeners or other errors.

3.3 tcpdump

tcpdump

captures packets for offline analysis with Wireshark; use

-C

/

-W

to limit file size and rotate captures.

PerformanceLinuxsystem monitoringtopiostatvmstatpidstat
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.