How to Diagnose Linux Server Performance Issues in 60 Seconds with 10 Essential Commands
Learn to quickly pinpoint Linux server bottlenecks by running ten powerful commands—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—within a minute, interpreting their outputs using the USE method to assess utilization, saturation, and errors across CPU, memory, disk, and network resources.
Mastering performance optimization tools and methods requires continuous practice; solid fundamentals such as networking and operating systems are essential to identify key performance issues.
While monitoring tools can solve many problems, sometimes you need to log into the instance and run standard Linux performance utilities.
https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55
Netflix’s performance engineering team demonstrates how ten commands can diagnose machine performance problems within a minute. By running these commands you obtain a high‑level view of system resource usage, locate errors and saturation metrics, and assess utilization.
The output of these commands helps quickly pinpoint bottlenecks. The highlighted counters follow Brendan Gregg’s USE method (Utilization, Saturation, Errors).
https://www.brendangregg.com/usemethod.html
The following ten commands are recommended:
<code>uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
</code>1. uptime
This command shows the system load averages for the past 1, 5, and 15 minutes, helping you gauge whether the server is under sustained pressure.
<code>$ uptime
23:51:26 up 21:31, 1 user, load average: 30.02, 26.43, 19.02
</code>The load average represents the average number of runnable or uninterruptible processes.
Comparing the 1‑minute and 15‑minute values reveals whether high load is transient or persistent.
A high 1‑minute load combined with lower 15‑minute load suggests a recent spike that warrants deeper investigation.
2. dmesg | tail
<code>$ dmesg | tail
[1880957.563150] perl invoked oom‑killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[1880957.563408] Killed process 18694 (perl) total‑vm:1972392kB, anon‑rss:1953348kB, file‑rss:0kB
[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP counters.
</code>Shows the last ten kernel messages, useful for spotting OOM kills, driver errors, or network anomalies.
3. vmstat
<code>$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
34 0 0 200889792 73708 591828 0 0 0 5 6 10 96 1 3 0 0
</code>Key columns:
r : processes waiting for CPU (more than CPU cores indicates saturation).
free : available memory in KB.
si/so : swap I/O; non‑zero values mean memory pressure.
us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and stolen.
4. mpstat -P ALL 1
<code>$ mpstat -P ALL 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:38:49 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
07:38:50 PM all 98.47 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.00 0.78
</code>Shows per‑CPU utilization; a single CPU with very high usage may indicate a single‑threaded hotspot.
5. pidstat 1
<code>$ pidstat 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:41:02 PM UID PID %usr %system %guest %CPU CPU Command
07:41:03 PM 0 6521 1596.23 1.89 0.00 1598.11 27 java
</code>Continuously reports per‑process CPU usage; high percentages (e.g., >100% on multi‑core) reveal which processes dominate CPU resources.
6. iostat -xz 1
<code>$ iostat -xz 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.13 0.00 0.10 0.01 0.00 99.76
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
vda 0.00 0.62 0.03 0.89 0.57 7.97 18.52 0.00 0.68 1.96 0.64 0.60 0.06
</code>Key metrics:
r/s, w/s, rkB/s, wkB/s : read/write operations and throughput.
await : average I/O wait time (high values indicate latency).
avgqu‑sz : average queue length (values >1 suggest saturation).
%util : device utilization; >60% may impact performance, 100% means fully saturated.
7. free -m
<code>$ free -m
total used free shared buffers cached
Mem: 245998 24545 221453 83 59 541
-/+ buffers/cache: 23944 222053
Swap: 0 0 0
</code>Shows memory usage; the “‑/+ buffers/cache” line reflects memory actually available to applications because Linux uses free memory for caching.
8. sar -n DEV 1
<code>$ sar -n DEV 1
12:16:48 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
12:16:49 AM eth0 18763.00 5032.00 20686.42 478.30 0.00 0.00 0.00 0.00
</code>Monitors network interface throughput; values far below the hardware limit indicate the network is not the bottleneck.
9. sar -n TCP,ETCP 1
<code>$ sar -n TCP,ETCP 1
12:17:19 AM active/s passive/s ise g/oseg/s
12:17:20 AM 1.00 0.00 10233.00 18846.00
</code>Shows TCP connection statistics: active/s (outbound connections), passive/s (inbound), retrans/s (retransmissions). High retransmissions may point to network issues or server overload.
10. top
<code>$ top
top - 00:15:40 up 21:56, 1 user, load average: 31.09, 29.87, 29.92
Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie
%Cpu(s): 96.8 us, 0.4 sy, 2.7 id, 0.1 wa
</code>Provides a snapshot of CPU, memory, and process activity; can be sorted to find the most resource‑intensive processes. Because it refreshes continuously, pausing the output may be necessary for detailed analysis.
Summary
These Linux tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—allow rapid identification of performance bottlenecks. By interpreting their outputs through the USE framework, you can determine whether CPU, memory, disk, or network resources are saturated or encountering errors, and then focus optimization efforts on the offending components.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.