Operations 18 min read

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

This article explains how Netflix's performance team uses ten standard Linux command‑line tools to quickly assess system health within the first minute, focusing on error detection, resource saturation, and utilization across CPU, memory, disk, and network to pinpoint performance problems.

MaGe Linux Operations

Aug 7, 2020

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

When you encounter performance problems on a Linux server, the first minute is critical; this guide shows which system metrics to check and why. Netflix monitors large EC2 fleets with Atlas and Vector, but still relies on standard Linux tools for rapid root‑cause analysis.

The performance engineering team presents ten command‑line utilities that can be run in the first 60 seconds to obtain a holistic view of the system, following the USE method (Utilization, Saturation, Errors) for CPU, memory, disk, and network resources.

1. uptime

uptime

Displays the load averages for the past 1, 5 and 15 minutes, giving a quick sense of how many tasks are waiting for CPU or I/O. Comparing the three values helps you see whether the load is rising or has already peaked.

2. dmesg | tail

dmesg | tail

Shows the most recent kernel messages, useful for spotting OOM‑killer events, driver errors, or network issues such as SYN floods.

3. vmstat 1

vmstat 1

Provides virtual memory, CPU and I/O statistics every second. Important fields: r (runnable tasks), free (free memory), si/so (swap activity), and us/sy/id/wa/st (CPU time distribution). A high r relative to CPU count indicates saturation.

4. mpstat -P ALL 1

mpstat -P ALL 1

Prints per‑CPU utilization; uneven usage may reveal a single‑threaded bottleneck.

5. pidstat 1

pidstat 1

Shows CPU usage per process each second, allowing you to identify processes that consume many cores (e.g., Java processes showing >1500 % CPU on a 32‑CPU box).

6. iostat -xz 1

iostat -xz 1

Provides block‑device statistics. Key metrics: r/s , w/s , rkB/s , wkB/s (throughput), await (average response time), avgqu‑sz (queue length) and %util (device utilization). Values above 60 % utilization or high await suggest disk saturation.

7. free -m

free -m

Shows total, used and free memory, as well as buffers and cache. The “-/+ buffers/cache” line gives a more accurate view of memory actually used by applications.

8. sar -n DEV 1

sar -n DEV 1

Reports network interface throughput (rxkB/s, txkB/s) and utilization, helping you see whether the NIC is a bottleneck.

9. sar -n TCP,ETCP 1

sar -n TCP,ETCP 1

Shows TCP statistics such as active/s passive/s connections and retransmissions; spikes may indicate network or server overload.

10. top

top

Aggregates many of the above metrics in real time; useful for confirming whether the situation observed with other tools has changed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Ops command-line system-administration

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.