Master Linux Performance Diagnosis in 60 Seconds with 10 Essential Commands
When troubleshooting a Linux server, this guide shows the ten essential command‑line tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—to quickly assess CPU, memory, disk, and network health within the first sixty seconds, helping you identify saturation and bottlenecks.
Linux Performance Diagnosis in 60,000 Milliseconds
When you log into a Linux server to solve a performance problem, what should you check in the first minute?
At Netflix we run a massive EC2 Linux cloud and use many performance‑analysis tools such as Atlas for cloud monitoring and Vector for on‑demand instance analysis. While those tools solve most issues, sometimes you still need to log into a server instance and run standard Linux performance utilities.
In this article the Netflix Performance Engineering team explains what to do in the first 60 seconds of a command‑line performance investigation using only the standard Linux tools you should already have.
First 60 Seconds: Overview
Running the following ten commands gives you a quick view of the processes and resource usage on the system. By looking at error messages and resource saturation (both easy to understand) you can decide where to focus your optimization efforts.
uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
Some of these commands require the sysstat package to be installed. The output helps you apply the USE method (Utilization, Saturation, Errors) to locate bottlenecks, such as checking CPU, memory, and disk usage, saturation, and error messages.
1. uptime
$ uptime 23:51:26 up 21:31, 1 user, load average: 30.02, 26.43, 19.02
uptimeshows the system’s average load, indicating how many processes are runnable or waiting for I/O. The three numbers are the 1‑, 5‑, and 15‑minute load averages. A large gap (e.g., the 1‑minute value much higher than the 15‑minute value) can indicate a recent spike.
2. dmesg | tail
$ dmesg | tail [1880957.563150] perl invoked oom‑killer: gfp_mask=0x280da, order=0, oom_score_adj=0 ... [1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child [2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP counters.
This command shows the last ten kernel messages, useful for spotting errors that may cause performance problems, such as OOM killer events or TCP packet drops.
3. vmstat 1
$ vmstat 1 procs ---------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 34 0 0 200889792 73708 591828 0 0 0 5 6 10 96 1 3 0 0 ...
vmstatreports virtual memory statistics each second. Important columns include:
r : number of processes waiting for CPU (values greater than the number of CPUs indicate CPU saturation).
free : free memory in kilobytes.
si and so : swap‑in and swap‑out rates (non‑zero values indicate memory pressure).
us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and stolen (e.g., by a hypervisor).
High wa (I/O wait) suggests a disk bottleneck, while a high sy (>20 %) may indicate kernel overhead.
4. mpstat -P ALL 1
$ mpstat -P ALL 1 Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU) 07:38:49 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:38:50 PM all 98.47 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.00 0.78 07:38:50 PM 0 96.04 0.00 2.97 0.00 0.00 0.00 0.00 0.00 0.00 0.99 ...
This prints per‑CPU utilization, helping you spot an uneven load where a single CPU is heavily used.
5. pidstat 1
$ pidstat 1 UID PID %usr %system %guest %CPU CPU Command 0 9 0.00 0.94 0.00 0.94 1 rcuos/0 0 4214 5.66 5.66 0.00 11.32 15 mesos‑slave 0 6521 1596.23 1.89 0.00 1598.11 27 java ...
pidstatprovides a rolling per‑process summary similar to top. In the example two Java processes consume a huge amount of CPU (≈16 CPU cores).
6. iostat -xz 1
$ iostat -xz 1 avg-cpu: %user %nice %system %iowait %steal %idle 73.96 0.00 3.73 0.03 0.06 22.21 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util xvda 0.00 0.23 0.21 0.18 4.52 2.08 34.37 0.00 9.98 13.80 5.42 2.44 0.09 ...
iostatshows block‑device statistics. Key columns:
r/s, w/s, rkB/s, wkB/s : read/write operations and throughput.
await : average I/O latency (including queue time).
avgqu‑sz : average queue length (values >1 indicate saturation).
%util : device utilization; >60 % usually signals a performance problem.
7. free -m
$ free -m total used free shared buffers cached Mem: 245998 24545 221453 83 59 541 -/+ buffers/cache: 23944 222053 Swap: 0 0 0
The second line shows memory used for buffers and cache. The -/+ buffers/cache line gives a more accurate view of actual memory usage because Linux repurposes unused memory as cache.
8. sar -n DEV 1
$ sar -n DEV 1 IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil eth0 18763.00 5032.00 20686.42 478.30 0.00 0.00 0.00 0.00 lo 14.00 14.00 1.36 1.36 0.00 0.00 0.00 0.00 ...
sarreports network interface throughput; the example shows eth0 receiving about 22 MB/s (≈176 Mbit/s) well below a 1 Gbit/s limit.
9. sar -n TCP,ETCP 1
$ sar -n TCP,ETCP 1 active/s passive/s iseg/s oseg/s 1.00 0.00 10233.00 18846.00 ...
This summarizes key TCP metrics: active/s (outgoing connections), passive/s (incoming connections), and retrans/s (retransmissions). Low values indicate normal operation.
10. top
$ top Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie %Cpu(s): 96.8 us, 0.4 sy, 0.0 ni, 2.7 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st ...
topprovides a real‑time view of many of the metrics already covered by the previous commands, but it is harder to see trends over time compared with the rolling outputs of vmstat or pidstat.
Further Analysis
More commands and techniques are available for deeper investigation. See Brendan Gregg’s Linux performance‑tool tutorial from Velocity 2015, which covers over 40 commands spanning observability, benchmarking, tuning, static performance analysis, and tracing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
