Diagnosing Linux Server Performance in the First 60 Seconds
This guide walks you through ten essential Linux command‑line tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—explaining what each metric means, how to interpret the output, and how to quickly spot utilization, saturation, and error issues within the first minute of investigation.
Why the first minute matters
When a Linux server shows performance degradation, the quickest way to narrow down the problem is to collect a set of core system metrics within the first 60 seconds. Netflix’s performance engineering team groups these metrics into three categories: Utilization, Saturation, and Error (USE), and applies them to CPU, memory, disk, and network resources.
Command list
uptime</code>
<code>dmesg | tail</code>
<code>vmstat 1</code>
<code>mpstat -P ALL 1</code>
<code>pidstat 1</code>
<code>iostat -xz 1</code>
<code>free -m</code>
<code>sar -n DEV 1</code>
<code>sar -n TCP,ETCP 1</code>
<code>top1. uptime
Shows the system’s load average for the past 1, 5, and 15 minutes. The three numbers are exponentially weighted averages of the number of runnable and uninterruptible tasks. A sudden spike in the 1‑minute value compared to the 15‑minute value often indicates a recent load surge.
2. dmesg | tail
Displays the most recent kernel messages. Look for OOM‑killer events, hardware errors, or network‑related warnings that could explain performance anomalies.
3. vmstat 1
Provides per‑second snapshots of virtual memory, swap, I/O, and CPU statistics. Important fields:
r : runnable tasks (higher than the number of CPUs signals CPU saturation).
free : free memory in KB.
si/so : pages swapped in/out (non‑zero values indicate memory pressure).
us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and steal (virtualization) states.
Summing us and sy gives overall CPU utilization; a high wa value points to I/O bottlenecks.
4. mpstat -P ALL 1
Shows per‑CPU utilization percentages. Uniform usage across CPUs suggests balanced workload, while a single CPU with a high usage percentage indicates a single‑threaded bottleneck.
5. pidstat 1
Reports CPU usage per process at one‑second intervals. The %CPU column is relative to all CPUs, so values > 100 % mean the process is consuming more than one CPU core. This helps identify runaway processes.
6. iostat -xz 1
Measures block‑device performance. Key metrics:
r/s, w/s, rkB/s, wkB/s : read/write request rates and data throughput.
await : average request latency (high values indicate saturation).
avgqu‑sz : average queue length (values > 1 suggest a bottleneck).
%util : device utilization; > 60 % often correlates with performance problems.
Remember that logical devices may mask underlying physical device health.
7. free -m
Shows memory usage broken down into total, used, free, buffers, and cache. The “-/+ buffers/cache” line gives a more accurate view of memory actually consumed by applications. Large cache values are normal; they are reclaimed when needed.
8. sar -n DEV 1
Monitors network interface throughput ( rxkB/s, txkB/s) and utilization ( %ifutil). Compare observed rates against expected bandwidth to detect network saturation.
9. sar -n TCP,ETCP 1
Provides TCP‑level statistics:
active/s : outgoing connections per second.
passive/s : incoming connections per second.
retrans/s : TCP retransmissions per second (high values signal network or server overload).
These counters help gauge connection load and identify possible packet loss.
10. top
Combines many of the above metrics in a dynamic, full‑screen view. It shows per‑process CPU and memory usage, overall load average, and a quick snapshot of system state. Because top refreshes the screen, transient spikes can be missed; for continuous monitoring, use vmstat or pidstat instead.
Putting it together
Start with uptime and dmesg to get a high‑level view and check for kernel errors. Then drill down with vmstat and mpstat to assess CPU and memory pressure. Use pidstat to pinpoint offending processes, iostat for disk health, and sar for network diagnostics. Finally, verify the overall picture with top. By following this USE‑oriented workflow, you can quickly narrow the investigation scope and focus on the most likely resource‑saturation or error source.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
