Operations 17 min read

Essential Linux Server Monitoring Tools and How to Interpret Their Metrics

This article introduces key Linux monitoring utilities—top, vmstat, pidstat, iostat, netstat, sar, and tcpdump—explains the meaning of their output fields, and shows how to use them to diagnose CPU, memory, disk, and network performance issues on production servers.

Efficient Ops
Efficient Ops
Efficient Ops
Essential Linux Server Monitoring Tools and How to Interpret Their Metrics

Linux servers constantly expose a wealth of performance parameters that are crucial for both operations engineers and developers when troubleshooting abnormal program behavior.

Simple command‑line tools read data from

/proc

and

/sys

to present these metrics; more advanced analysis may require specialized utilities such as

perf

or

systemtap

.

CPU and Memory Monitoring

top

top

displays load averages, task states, and per‑CPU usage. The first line shows 1‑, 5‑, and 15‑minute load averages; values exceeding the number of CPU cores indicate saturation. The second line lists task counts (running, sleeping, stopped, zombie). Subsequent columns break down CPU time into user (us), system (sy), nice (ni), idle (id), iowait (wa), irq (hi), softirq (si), and steal (st). High values in each column suggest specific investigation paths, such as locating CPU‑intensive processes, checking I/O‑bound kernel activity, or detecting hypervisor over‑commit.

The fourth and fifth lines report physical and virtual memory.

total = free + used + buff/cache

;

Buffers

cache raw disk metadata, while

Cached

stores file data.

Avail Mem

indicates memory readily usable without swapping. Frequent swap activity signals memory pressure.

Note that

top

itself consumes resources and is best for short‑term, interactive monitoring.

vmstat

vmstat

provides a snapshot of processes, memory, paging, block I/O, traps, and CPU activity. Columns include runnable processes (r), uninterruptible sleep (b), swapped memory (swpd), buffers, cached, block I/O (bi/bo), interrupts (in), and context switches (cs).

Experiments with different

-j

values when compiling show that context‑switch rates remain stable until the parallelism level is pushed high enough to cause noticeable increases.

pidstat

pidstat

offers per‑process statistics, including page faults (

minflt/s

minor,

majflt/s

major), stack usage, CPU usage, and thread‑level context switches. Options such as

-t

(thread view),

-r

(memory),

-s

(stack),

-u

(CPU), and

-w

(context switches) make it ideal for deep analysis of individual or multithreaded programs.

Other CPU Tools

For per‑CPU inspection on SMP systems,

mpstat -P ALL 1

shows load distribution across cores. Filtering

top

by user (

top -u username

) or using

ps

with custom columns can isolate specific processes, and

ps axjf

visualizes process trees.

Disk I/O Monitoring

iotop

visualizes real‑time disk read/write rates per process.

lsof

reveals which processes hold files or devices open, useful for diagnosing unmount failures.

iostat -xz 1

reports key disk metrics: average queue length (

avgqu-sz

), average request latency (

await

), service time (

svctm

), and utilization (

%util

). Values >1 for

avgqu-sz

or >60% for

%util

indicate potential saturation.

These metrics also apply to network file systems, though kernel I/O caching can mask some performance impacts.

Network Monitoring

Network health is critical for servers.

iptraf

and

sar -n DEV 1

show interface throughput and utilization.

netstat

netstat -s

displays cumulative protocol statistics since boot;

netstat -antp

lists active TCP connections, while

netstat -nltp

shows listening sockets.

sar

Using

sar -n TCP,ETCP 1

provides per‑second TCP metrics such as active opens, passive opens, retransmissions, and input errors. For UDP,

sar -n UDP 1

reports packets received on closed ports and input errors, helping assess reliability.

tcpdump

tcpdump

captures raw packets for offline analysis with Wireshark. It supports size‑based rotation (

-C

/

-W

) and extensive filtering (interface, host, port, protocol). Captured packets include timestamps, enabling precise reconstruction of connection sequences, though the tool adds overhead that must be considered in production.

linuxsystem monitoringtopPerformance Metricstcpdumpnetstatiostatvmstatpidstat
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.