Operations 18 min read

Master Linux Performance Troubleshooting in the First 60 Seconds

This article shows how Netflix's performance engineering team uses ten essential Linux commands—such as uptime, vmstat, mpstat, iostat, and top—to quickly assess system load, resource saturation, and errors within the first minute of investigation, following the USE method.

Efficient Ops
Efficient Ops
Efficient Ops
Master Linux Performance Troubleshooting in the First 60 Seconds

First Minute: Overview

Netflix’s performance engineering team demonstrates how to diagnose Linux performance problems in the first 60 seconds using standard command‑line tools. The approach follows the USE method (Utilization, Saturation, Errors) to identify bottlenecks.

Key Commands

uptime uptime Shows the system load averages for the past 1, 5, and 15 minutes, giving a quick sense of overall demand.

dmesg | tail dmesg | tail Displays the most recent kernel messages; look for errors such as OOM killer events or TCP drops.

vmstat 1 vmstat 1 Provides per‑second snapshots of processes, memory, swap, I/O, and CPU. Important columns: r (run queue length), free (free memory), si/so (swap activity), and us/sy/id/wa/st (CPU breakdown).

mpstat -P ALL 1 mpstat -P ALL 1 Shows CPU usage per core; high usage on a single core may indicate a single‑threaded bottleneck.

pidstat 1 pidstat 1 Similar to top but records per‑process statistics over time, useful for spotting processes that consume excessive CPU.

iostat -xz 1 iostat -xz 1 Reports block device statistics. Key metrics: r/s, w/s, rkB/s, wkB/s (throughput), await (average I/O latency), avgqu‑sz (average queue size), and %util (device utilization).

free -m free -m Shows total, used, and free memory, including buffers and cache. The “‑/+ buffers/cache” line gives a clearer view of memory actually available to applications.

sar -n DEV 1 sar -n DEV 1 Monitors network interface throughput ( rxkB/s , txkB/s ) and utilization ( %ifutil ).

sar -n TCP,ETCP 1 sar -n TCP,ETCP 1 Provides TCP statistics such as active connections, passive connections, and retransmissions, which can indicate network or server overload.

top top Aggregates many of the above metrics in a real‑time view, showing per‑process CPU and memory usage.

Using the Data

Combine the outputs to assess whether the system is saturated (high r or %util), whether errors are present (kernel messages, OOM events), and where the bottleneck lies (CPU, memory, disk, or network). Install the sysstat package if any of the commands are missing.

Further Reading

For a deeper dive, see Brendan’s 2015 Velocity talk on Linux performance tools, which covers over 40 commands for observability, benchmarking, tuning, profiling, and tracing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringperformanceCommand-line
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.