Operations 18 min read

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

This article walks you through the ten essential Linux command‑line tools—such as uptime, vmstat, iostat, and top—that Netflix’s performance engineers use to quickly assess system load, resource saturation, and errors within the critical first minute of troubleshooting.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose Linux Server Performance Issues in the First 60 Seconds

When a Linux server shows performance problems, the first minute is crucial. Netflix’s performance engineering team relies on a set of standard Linux command‑line tools to gather key metrics quickly.

The approach focuses on three aspects—Utilization, Saturation, and Error (USE)—to evaluate CPU, memory, disk, and network resources.

1. uptime

$ uptime

Shows the system’s average load over the last 1, 5, and 15 minutes, indicating how many tasks are waiting for CPU or I/O.

2. dmesg | tail

$ dmesg | tail

Displays the most recent kernel messages, helping to spot errors such as OOM kills or TCP issues.

3. vmstat 1

$ vmstat 1

Provides virtual memory and CPU statistics every second. Important fields include:

r : runnable tasks (CPU saturation indicator)

free : free memory in KB

si, so : pages swapped in/out (memory pressure)

us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and steal.

High wa suggests I/O bottlenecks; high sy (>20%) may indicate kernel overhead.

4. mpstat -P ALL 1

$ mpstat -P ALL 1

Shows per‑CPU utilization, useful for detecting uneven load or single‑threaded hotspots.

5. pidstat 1

$ pidstat 1

Similar to top but provides periodic snapshots of per‑process CPU usage, making it easier to track processes that consume many cores.

6. iostat -xz 1

$ iostat -xz 1

Reports block device statistics. Key metrics:

r/s, w/s, rkB/s, wkB/s : read/write request rates and throughput

await : average I/O response time (high values indicate saturation)

avgqu‑sz : average queue length (values >1 suggest bottleneck)

%util : device utilization (values >60% often cause performance issues).

7. free -m

$ free -m

Shows memory usage, distinguishing between used, free, buffers, and cache. The “-/+ buffers/cache” line gives a more accurate view of usable memory.

8. sar -n DEV 1

$ sar -n DEV 1

Monitors network interface throughput (rxkB/s, txkB/s) and utilization (%ifutil) to identify possible network bottlenecks.

9. sar -n TCP,ETCP 1

$ sar -n TCP,ETCP 1

Provides TCP statistics such as active connections per second, passive connections, and retransmissions, which help assess network health.

10. top

$ top

Aggregates many of the above metrics in real time, allowing a quick sanity check of system load and process activity. For detailed trend analysis, tools like vmstat or pidstat are preferred because they can log data over time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OpsLinuxSystem Administration
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.