Operations 19 min read

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

This guide walks you through ten essential Linux command‑line tools—such as uptime, vmstat, iostat, and top—showing how Netflix’s performance engineers use them to quickly assess system load, resource saturation, and errors within the first minute of investigation.

IT Architects Alliance

Sep 8, 2020

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

Netflix’s performance engineering team explains how to use standard Linux command‑line utilities to identify system‑wide performance problems in the first 60 seconds after noticing an issue. The article presents ten commands, the key metrics each provides, and practical interpretation tips.

1. uptime

uptime

The command prints the system’s load averages for the past 1, 5, and 15 minutes, giving a quick view of how many tasks are waiting for CPU or I/O. A high 1‑minute load relative to the 15‑minute value often indicates a recent spike.

2. dmesg | tail

dmesg | tail

Shows the most recent kernel messages, helping you spot errors such as out‑of‑memory kills or TCP SYN floods that may be causing performance degradation.

3. vmstat 1

vmstat 1

Displays virtual memory, CPU, and I/O statistics every second. Important fields include:

r : runnable tasks (higher than CPU count → CPU saturation)

free : free memory in KB

si/so : pages swapped in/out (non‑zero indicates memory pressure)

us, sy, id, wa, st : user, system, idle, I/O wait, and steal time percentages

Combining us and sy shows overall CPU utilization; a high wa suggests I/O bottlenecks.

4. mpstat -P ALL 1

mpstat -P ALL 1

Prints per‑CPU usage every second, allowing you to see whether a single core is overloaded (e.g., a single‑threaded workload).

5. pidstat 1

pidstat 1

Shows CPU usage per process at one‑second intervals. The %CPU column can exceed 100 % on multi‑core systems, indicating how many cores a process is consuming (e.g., 1591 % ≈ 16 cores).

6. iostat -xz 1

iostat -xz 1

Provides block‑device statistics. Key metrics:

r/s, w/s, rkB/s, wkB/s : read/write request rates and throughput

await : average I/O request latency (high values signal saturation)

avgqu‑sz : average queue length (values > 1 indicate a backlog)

%util : device utilization; > 60 % often correlates with performance issues

Logical devices may show high utilization even when underlying physical disks are not saturated.

7. free -m

free -m

Shows memory usage in megabytes. The “‑/+ buffers/cache” line gives a more accurate view of memory actually used by applications, because Linux repurposes free memory for disk caching.

8. sar -n DEV 1

sar -n DEV 1

Monitors network interface throughput (rxkB/s, txkB/s) and utilization (%ifutil). The example shows ~22 M B/s on eth0, well below a 1 Gbit link, indicating no network bottleneck.

9. sar -n TCP,ETCP 1

sar -n TCP,ETCP 1

Reports TCP statistics such as active connections per second, passive connections per second, and retransmissions. Low values suggest normal network behavior, while high retransmission rates point to network or server issues.

10. top

top

Provides a real‑time snapshot of processes, CPU, memory, and load averages. It aggregates many of the metrics covered by the previous commands, but because it refreshes the screen it can hide transient spikes that tools like vmstat or pidstat capture.

By following this “USE” methodology—Utilization, Saturation, and Error—engineers can quickly narrow down the root cause of performance problems, prioritize further investigation, and avoid chasing irrelevant metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations performance monitoring Linux Troubleshooting command-line system metrics

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.