Operations 21 min read

How to Diagnose Linux Server Performance in the First 60 Seconds

When you log into a Linux server for performance troubleshooting, Netflix’s engineering team shows that running ten standard command‑line tools within the first minute gives a comprehensive view of system load, resource saturation, errors, and bottlenecks, enabling rapid root‑cause analysis.

Liangxu Linux

Jun 7, 2020

How to Diagnose Linux Server Performance in the First 60 Seconds

Quick 60‑second Linux performance checklist

When a Linux server shows signs of trouble, the first minute after logging in can be used to run a small set of standard utilities that give an immediate view of system health. The commands below cover CPU load, kernel messages, process‑level statistics, memory usage, block‑device I/O, network activity and a dynamic process list. They require the sysstat package (provides vmstat, mpstat, pidstat, iostat and sar) to be installed.

Command list

uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top

1. uptime

Displays how long the system has been running and the 1‑, 5‑, and 15‑minute load averages. The load average represents the average number of runnable or uninterruptible processes. Compare the 1‑minute value with the longer intervals: a sudden rise indicates a recent spike in demand.

2. dmesg | tail

Shows the most recent kernel log entries. Look for OOM‑killer messages, device errors, or network‑related warnings that often precede performance degradation.

3. vmstat 1

Provides per‑second snapshots of key system counters. Important columns:

r : number of processes waiting for CPU (values greater than the number of cores suggest CPU saturation).

b : processes blocked for uninterruptible I/O.

free : free memory in kilobytes.

si/so : swap‑in and swap‑out rates; non‑zero values indicate memory pressure.

us, sy, id, wa, st : CPU time distribution (user, system, idle, I/O wait, stolen). A high wa value points to I/O bottlenecks.

The first line of vmstat shows averages since boot; ignore it when running with an interval.

4. mpstat -P ALL 1

Shows per‑CPU utilization. Uniformly high usage on a single core reveals a single‑threaded workload; uneven distribution may indicate affinity or scheduling problems.

5. pidstat 1

Continuously prints per‑process CPU usage without clearing the screen, making it easy to copy‑paste. Columns of interest:

%CPU : CPU usage of the process summed across all threads (values > 100% mean the process is using multiple cores).

CPU : the specific core on which the process is running.

Identify processes with unusually high %CPU (e.g., a Java process consuming many cores).

6. iostat -xz 1

Reports detailed block‑device statistics. Key fields:

r/s, w/s : reads and writes per second.

rkB/s, wkB/s : throughput in kilobytes per second.

await : average I/O wait time (ms).

avgqu‑sz : average request queue length; values > 1 may indicate saturation.

%util : device utilization percentage; > 60 % often degrades performance, > 90 % signals saturation.

High utilization alone does not guarantee a problem if the underlying storage can sustain the load.

7. free -m

Shows total, used and free memory in megabytes, plus buffers and cache. The “‑/+ buffers/cache” line gives the memory actually available to applications. Low free memory combined with high cache is normal; only a very low “available” value (after subtracting buffers/cache) is concerning.

8. sar -n DEV 1

Monitors network interface statistics each second.

rxkB/s and txkB/s : inbound and outbound traffic rates.

%ifutil (if reported): interface utilization as a percentage of the NIC’s full‑duplex capacity.

Compare the observed rates with the NIC’s rated bandwidth (e.g., 1 Gbps = 125 MB/s) to detect saturation.

9. sar -n TCP,ETCP 1

Provides TCP‑level metrics:

active/s : connections initiated locally per second.

passive/s : connections accepted per second.

retrans/s : TCP retransmissions per second, a sign of packet loss or server overload.

Spikes in retrans/s often indicate network issues; a steady low value is normal.

10. top

Displays a dynamic, full‑screen view of processes, CPU and memory usage. Use Ctrl‑S to pause the screen if you need to capture a snapshot. While useful for a quick glance, the constantly refreshing display makes it harder to observe trends compared with the scrolling tools above.

Follow‑on analysis

The ten‑command checklist implements the USE (Utilization, Saturation, Errors) methodology: check resource utilization, look for saturation indicators, and examine error messages. For deeper investigation you can explore additional tools such as perf, strace, ltrace and Brendan Gregg’s Linux performance toolbox (covers > 40 commands for tracing, profiling and benchmarking).

Original article: https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations performance monitoring command-line Sysstat

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.