Operations 20 min read

How to Diagnose Linux Server Performance Issues in 60 Seconds with 10 Essential Commands

Learn to quickly pinpoint Linux server bottlenecks by running ten powerful commands—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—within a minute, interpreting their outputs using the USE method to assess utilization, saturation, and errors across CPU, memory, disk, and network resources.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Diagnose Linux Server Performance Issues in 60 Seconds with 10 Essential Commands

Mastering performance optimization tools and methods requires continuous practice; solid fundamentals such as networking and operating systems are essential to identify key performance issues.

While monitoring tools can solve many problems, sometimes you need to log into the instance and run standard Linux performance utilities.

https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55

Netflix’s performance engineering team demonstrates how ten commands can diagnose machine performance problems within a minute. By running these commands you obtain a high‑level view of system resource usage, locate errors and saturation metrics, and assess utilization.

The output of these commands helps quickly pinpoint bottlenecks. The highlighted counters follow Brendan Gregg’s USE method (Utilization, Saturation, Errors).

https://www.brendangregg.com/usemethod.html

The following ten commands are recommended:

<code>uptime
 dmesg | tail
 vmstat 1
 mpstat -P ALL 1
 pidstat 1
 iostat -xz 1
 free -m
 sar -n DEV 1
 sar -n TCP,ETCP 1
 top
</code>

1. uptime

This command shows the system load averages for the past 1, 5, and 15 minutes, helping you gauge whether the server is under sustained pressure.

<code>$ uptime
23:51:26 up 21:31,  1 user,  load average: 30.02, 26.43, 19.02
</code>

The load average represents the average number of runnable or uninterruptible processes.

Comparing the 1‑minute and 15‑minute values reveals whether high load is transient or persistent.

A high 1‑minute load combined with lower 15‑minute load suggests a recent spike that warrants deeper investigation.

2. dmesg | tail

<code>$ dmesg | tail
[1880957.563150] perl invoked oom‑killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[1880957.563408] Killed process 18694 (perl) total‑vm:1972392kB, anon‑rss:1953348kB, file‑rss:0kB
[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP counters.
</code>

Shows the last ten kernel messages, useful for spotting OOM kills, driver errors, or network anomalies.

3. vmstat

<code>$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
34  0      0 200889792  73708 591828   0    0     0     5   6   10 96  1  3  0  0
</code>

Key columns:

r : processes waiting for CPU (more than CPU cores indicates saturation).

free : available memory in KB.

si/so : swap I/O; non‑zero values mean memory pressure.

us, sy, id, wa, st : CPU time spent in user, system, idle, I/O wait, and stolen.

4. mpstat -P ALL 1

<code>$ mpstat -P ALL 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:38:49 PM  CPU   %usr  %nice  %sys %iowait %irq %soft %steal %guest %gnice %idle
07:38:50 PM  all   98.47   0.00   0.75   0.00   0.00   0.00   0.00   0.00   0.00   0.78
</code>

Shows per‑CPU utilization; a single CPU with very high usage may indicate a single‑threaded hotspot.

5. pidstat 1

<code>$ pidstat 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:41:02 PM   UID   PID   %usr %system %guest %CPU CPU Command
07:41:03 PM   0    6521 1596.23  1.89   0.00 1598.11 27 java
</code>

Continuously reports per‑process CPU usage; high percentages (e.g., >100% on multi‑core) reveal which processes dominate CPU resources.

6. iostat -xz 1

<code>$ iostat -xz 1
avg-cpu: %user %nice %system %iowait %steal %idle
          0.13   0.00   0.10   0.01   0.00  99.76
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq‑sz avgqu‑sz await r_await w_await svctm %util
vda      0.00   0.62 0.03 0.89 0.57 7.97 18.52 0.00 0.68 1.96 0.64 0.60 0.06
</code>

Key metrics:

r/s, w/s, rkB/s, wkB/s : read/write operations and throughput.

await : average I/O wait time (high values indicate latency).

avgqu‑sz : average queue length (values >1 suggest saturation).

%util : device utilization; >60% may impact performance, 100% means fully saturated.

7. free -m

<code>$ free -m
              total   used   free  shared  buffers  cached
Mem:        245998  24545 221453     83      59    541
-/+ buffers/cache: 23944 222053
Swap:            0      0      0
</code>

Shows memory usage; the “‑/+ buffers/cache” line reflects memory actually available to applications because Linux uses free memory for caching.

8. sar -n DEV 1

<code>$ sar -n DEV 1
12:16:48 AM IFACE   rxpck/s txpck/s  rxkB/s  txkB/s  rxcmp/s txcmp/s rxmcst/s %ifutil
12:16:49 AM eth0   18763.00 5032.00 20686.42 478.30   0.00   0.00   0.00   0.00
</code>

Monitors network interface throughput; values far below the hardware limit indicate the network is not the bottleneck.

9. sar -n TCP,ETCP 1

<code>$ sar -n TCP,ETCP 1
12:17:19 AM active/s passive/s ise g/oseg/s
12:17:20 AM 1.00 0.00 10233.00 18846.00
</code>

Shows TCP connection statistics: active/s (outbound connections), passive/s (inbound), retrans/s (retransmissions). High retransmissions may point to network issues or server overload.

10. top

<code>$ top
top - 00:15:40 up 21:56, 1 user, load average: 31.09, 29.87, 29.92
Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie
%Cpu(s): 96.8 us, 0.4 sy, 2.7 id, 0.1 wa
</code>

Provides a snapshot of CPU, memory, and process activity; can be sorted to find the most resource‑intensive processes. Because it refreshes continuously, pausing the output may be necessary for detailed analysis.

Summary

These Linux tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—allow rapid identification of performance bottlenecks. By interpreting their outputs through the USE framework, you can determine whether CPU, memory, disk, or network resources are saturated or encountering errors, and then focus optimization efforts on the offending components.

monitoringPerformancelinuxcommand-lineSystem AdministrationUSE method
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.