How to Diagnose Linux Server Performance Issues in the First 60 Seconds
This guide walks you through ten essential Linux command‑line tools—such as uptime, vmstat, iostat, and top—showing how Netflix’s performance engineers use them to quickly assess system load, resource saturation, and errors within the first minute of investigation.
Netflix’s performance engineering team explains how to use standard Linux command‑line utilities to identify system‑wide performance problems in the first 60 seconds after noticing an issue. The article presents ten commands, the key metrics each provides, and practical interpretation tips.
1. uptime
uptimeThe command prints the system’s load averages for the past 1, 5, and 15 minutes, giving a quick view of how many tasks are waiting for CPU or I/O. A high 1‑minute load relative to the 15‑minute value often indicates a recent spike.
2. dmesg | tail
dmesg | tailShows the most recent kernel messages, helping you spot errors such as out‑of‑memory kills or TCP SYN floods that may be causing performance degradation.
3. vmstat 1
vmstat 1Displays virtual memory, CPU, and I/O statistics every second. Important fields include:
r : runnable tasks (higher than CPU count → CPU saturation)
free : free memory in KB
si/so : pages swapped in/out (non‑zero indicates memory pressure)
us, sy, id, wa, st : user, system, idle, I/O wait, and steal time percentages
Combining us and sy shows overall CPU utilization; a high wa suggests I/O bottlenecks.
4. mpstat -P ALL 1
mpstat -P ALL 1Prints per‑CPU usage every second, allowing you to see whether a single core is overloaded (e.g., a single‑threaded workload).
5. pidstat 1
pidstat 1Shows CPU usage per process at one‑second intervals. The %CPU column can exceed 100 % on multi‑core systems, indicating how many cores a process is consuming (e.g., 1591 % ≈ 16 cores).
6. iostat -xz 1
iostat -xz 1Provides block‑device statistics. Key metrics:
r/s, w/s, rkB/s, wkB/s : read/write request rates and throughput
await : average I/O request latency (high values signal saturation)
avgqu‑sz : average queue length (values > 1 indicate a backlog)
%util : device utilization; > 60 % often correlates with performance issues
Logical devices may show high utilization even when underlying physical disks are not saturated.
7. free -m
free -mShows memory usage in megabytes. The “‑/+ buffers/cache” line gives a more accurate view of memory actually used by applications, because Linux repurposes free memory for disk caching.
8. sar -n DEV 1
sar -n DEV 1Monitors network interface throughput (rxkB/s, txkB/s) and utilization (%ifutil). The example shows ~22 M B/s on eth0, well below a 1 Gbit link, indicating no network bottleneck.
9. sar -n TCP,ETCP 1
sar -n TCP,ETCP 1Reports TCP statistics such as active connections per second, passive connections per second, and retransmissions. Low values suggest normal network behavior, while high retransmission rates point to network or server issues.
10. top
topProvides a real‑time snapshot of processes, CPU, memory, and load averages. It aggregates many of the metrics covered by the previous commands, but because it refreshes the screen it can hide transient spikes that tools like vmstat or pidstat capture.
By following this “USE” methodology—Utilization, Saturation, and Error—engineers can quickly narrow down the root cause of performance problems, prioritize further investigation, and avoid chasing irrelevant metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
