Operations 18 min read

Master Linux Performance Troubleshooting in the First 60 Seconds

This article shows how Netflix's performance engineering team uses ten essential Linux commands—such as uptime, vmstat, mpstat, iostat, and top—to quickly assess system load, resource saturation, and errors within the first minute of investigation, following the USE method.

Efficient Ops
Efficient Ops
Efficient Ops
Master Linux Performance Troubleshooting in the First 60 Seconds

First Minute: Overview

Netflix’s performance engineering team demonstrates how to diagnose Linux performance problems in the first 60 seconds using standard command‑line tools. The approach follows the USE method (Utilization, Saturation, Errors) to identify bottlenecks.

Key Commands

uptime

<code>uptime</code>

Shows the system load averages for the past 1, 5, and 15 minutes, giving a quick sense of overall demand.

dmesg | tail

<code>dmesg | tail</code>

Displays the most recent kernel messages; look for errors such as OOM killer events or TCP drops.

vmstat 1

<code>vmstat 1</code>

Provides per‑second snapshots of processes, memory, swap, I/O, and CPU. Important columns: r (run queue length), free (free memory), si/so (swap activity), and us/sy/id/wa/st (CPU breakdown).

mpstat -P ALL 1

<code>mpstat -P ALL 1</code>

Shows CPU usage per core; high usage on a single core may indicate a single‑threaded bottleneck.

pidstat 1

<code>pidstat 1</code>

Similar to top but records per‑process statistics over time, useful for spotting processes that consume excessive CPU.

iostat -xz 1

<code>iostat -xz 1</code>

Reports block device statistics. Key metrics: r/s, w/s, rkB/s, wkB/s (throughput), await (average I/O latency), avgqu‑sz (average queue size), and %util (device utilization).

free -m

<code>free -m</code>

Shows total, used, and free memory, including buffers and cache. The “‑/+ buffers/cache” line gives a clearer view of memory actually available to applications.

sar -n DEV 1

<code>sar -n DEV 1</code>

Monitors network interface throughput ( rxkB/s , txkB/s ) and utilization ( %ifutil ).

sar -n TCP,ETCP 1

<code>sar -n TCP,ETCP 1</code>

Provides TCP statistics such as active connections, passive connections, and retransmissions, which can indicate network or server overload.

top

<code>top</code>

Aggregates many of the above metrics in a real‑time view, showing per‑process CPU and memory usage.

Using the Data

Combine the outputs to assess whether the system is saturated (high

r

or

%util

), whether errors are present (kernel messages, OOM events), and where the bottleneck lies (CPU, memory, disk, or network). Install the

sysstat

package if any of the commands are missing.

Further Reading

For a deeper dive, see Brendan’s 2015 Velocity talk on Linux performance tools, which covers over 40 commands for observability, benchmarking, tuning, profiling, and tracing.

monitoringPerformanceLinuxcommand linesysadmin
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.