How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands
This article explains how Netflix's performance team uses ten standard Linux command‑line tools to quickly assess system health within the first minute, focusing on error detection, resource saturation, and utilization across CPU, memory, disk, and network to pinpoint performance problems.
When you encounter performance problems on a Linux server, the first minute is critical; this guide shows which system metrics to check and why. Netflix monitors large EC2 fleets with Atlas and Vector, but still relies on standard Linux tools for rapid root‑cause analysis.
The performance engineering team presents ten command‑line utilities that can be run in the first 60 seconds to obtain a holistic view of the system, following the USE method (Utilization, Saturation, Errors) for CPU, memory, disk, and network resources.
1. uptime
uptimeDisplays the load averages for the past 1, 5 and 15 minutes, giving a quick sense of how many tasks are waiting for CPU or I/O. Comparing the three values helps you see whether the load is rising or has already peaked.
2. dmesg | tail
dmesg | tailShows the most recent kernel messages, useful for spotting OOM‑killer events, driver errors, or network issues such as SYN floods.
3. vmstat 1
vmstat 1Provides virtual memory, CPU and I/O statistics every second. Important fields: r (runnable tasks), free (free memory), si/so (swap activity), and us/sy/id/wa/st (CPU time distribution). A high r relative to CPU count indicates saturation.
4. mpstat -P ALL 1
mpstat -P ALL 1Prints per‑CPU utilization; uneven usage may reveal a single‑threaded bottleneck.
5. pidstat 1
pidstat 1Shows CPU usage per process each second, allowing you to identify processes that consume many cores (e.g., Java processes showing >1500 % CPU on a 32‑CPU box).
6. iostat -xz 1
iostat -xz 1Provides block‑device statistics. Key metrics: r/s , w/s , rkB/s , wkB/s (throughput), await (average response time), avgqu‑sz (queue length) and %util (device utilization). Values above 60 % utilization or high await suggest disk saturation.
7. free -m
free -mShows total, used and free memory, as well as buffers and cache. The “-/+ buffers/cache” line gives a more accurate view of memory actually used by applications.
8. sar -n DEV 1
sar -n DEV 1Reports network interface throughput (rxkB/s, txkB/s) and utilization, helping you see whether the NIC is a bottleneck.
9. sar -n TCP,ETCP 1
sar -n TCP,ETCP 1Shows TCP statistics such as active/s passive/s connections and retransmissions; spikes may indicate network or server overload.
10. top
topAggregates many of the above metrics in real time; useful for confirming whether the situation observed with other tools has changed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
