Operations 18 min read

Master Linux Performance Troubleshooting in the First 60 Seconds

This article shows how Netflix's performance engineering team uses ten essential Linux commands—such as uptime, vmstat, mpstat, iostat, and top—to quickly assess system load, resource saturation, and errors within the first minute of investigation, following the USE method.

Efficient Ops

May 15, 2023

Master Linux Performance Troubleshooting in the First 60 Seconds

First Minute: Overview

Netflix’s performance engineering team demonstrates how to diagnose Linux performance problems in the first 60 seconds using standard command‑line tools. The approach follows the USE method (Utilization, Saturation, Errors) to identify bottlenecks.

Key Commands

uptime uptime Shows the system load averages for the past 1, 5, and 15 minutes, giving a quick sense of overall demand.

dmesg | tail dmesg | tail Displays the most recent kernel messages; look for errors such as OOM killer events or TCP drops.

vmstat 1 vmstat 1 Provides per‑second snapshots of processes, memory, swap, I/O, and CPU. Important columns: r (run queue length), free (free memory), si/so (swap activity), and us/sy/id/wa/st (CPU breakdown).

mpstat -P ALL 1 mpstat -P ALL 1 Shows CPU usage per core; high usage on a single core may indicate a single‑threaded bottleneck.

pidstat 1 pidstat 1 Similar to top but records per‑process statistics over time, useful for spotting processes that consume excessive CPU.

iostat -xz 1 iostat -xz 1 Reports block device statistics. Key metrics: r/s, w/s, rkB/s, wkB/s (throughput), await (average I/O latency), avgqu‑sz (average queue size), and %util (device utilization).

free -m free -m Shows total, used, and free memory, including buffers and cache. The “‑/+ buffers/cache” line gives a clearer view of memory actually available to applications.

sar -n DEV 1 sar -n DEV 1 Monitors network interface throughput ( rxkB/s , txkB/s ) and utilization ( %ifutil ).

sar -n TCP,ETCP 1 sar -n TCP,ETCP 1 Provides TCP statistics such as active connections, passive connections, and retransmissions, which can indicate network or server overload.

top top Aggregates many of the above metrics in a real‑time view, showing per‑process CPU and memory usage.

Using the Data

Combine the outputs to assess whether the system is saturated (high r or %util), whether errors are present (kernel messages, OOM events), and where the bottleneck lies (CPU, memory, disk, or network). Install the sysstat package if any of the commands are missing.

Master Linux Performance Troubleshooting in the First 60 Seconds

First Minute: Overview

Key Commands

Using the Data

Further Reading

Efficient Ops

How this landed with the community

Was this worth your time?

0 Comments