Operations 18 min read

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

This article walks you through the ten essential Linux command‑line tools—such as uptime, vmstat, iostat, and top—that Netflix’s performance engineers use to quickly assess system load, resource saturation, and errors within the critical first minute of troubleshooting.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose Linux Server Performance Issues in the First 60 Seconds

When a Linux server shows performance problems, the first minute is crucial. Netflix’s performance engineering team relies on a set of standard Linux command‑line tools to gather key metrics quickly.

The approach focuses on three aspects—Utilization, Saturation, and Error (USE)—to evaluate CPU, memory, disk, and network resources.

1. uptime

<code>$ uptime</code>

Shows the system’s average load over the last 1, 5, and 15 minutes, indicating how many tasks are waiting for CPU or I/O.

2. dmesg | tail

<code>$ dmesg | tail</code>

Displays the most recent kernel messages, helping to spot errors such as OOM kills or TCP issues.

3. vmstat 1

<code>$ vmstat 1</code>

Provides virtual memory and CPU statistics every second. Important fields include:

r : runnable tasks (CPU saturation indicator)

free : free memory in KB

si, so : pages swapped in/out (memory pressure)

us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and steal.

High

wa

suggests I/O bottlenecks; high

sy

(>20%) may indicate kernel overhead.

4. mpstat -P ALL 1

<code>$ mpstat -P ALL 1</code>

Shows per‑CPU utilization, useful for detecting uneven load or single‑threaded hotspots.

5. pidstat 1

<code>$ pidstat 1</code>

Similar to

top

but provides periodic snapshots of per‑process CPU usage, making it easier to track processes that consume many cores.

6. iostat -xz 1

<code>$ iostat -xz 1</code>

Reports block device statistics. Key metrics:

r/s, w/s, rkB/s, wkB/s : read/write request rates and throughput

await : average I/O response time (high values indicate saturation)

avgqu‑sz : average queue length (values >1 suggest bottleneck)

%util : device utilization (values >60% often cause performance issues).

7. free -m

<code>$ free -m</code>

Shows memory usage, distinguishing between used, free, buffers, and cache. The “-/+ buffers/cache” line gives a more accurate view of usable memory.

8. sar -n DEV 1

<code>$ sar -n DEV 1</code>

Monitors network interface throughput (rxkB/s, txkB/s) and utilization (%ifutil) to identify possible network bottlenecks.

9. sar -n TCP,ETCP 1

<code>$ sar -n TCP,ETCP 1</code>

Provides TCP statistics such as active connections per second, passive connections, and retransmissions, which help assess network health.

10. top

<code>$ top</code>

Aggregates many of the above metrics in real time, allowing a quick sanity check of system load and process activity. For detailed trend analysis, tools like

vmstat

or

pidstat

are preferred because they can log data over time.

Performance MonitoringOpsLinuxcommand lineSystem Administration
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.