Operations 21 min read

Diagnose Linux Server Bottlenecks in 60 Seconds with 10 Essential Commands

When a Linux server suddenly spikes in load, this guide shows how to pinpoint the root cause within a minute by running ten key commands that reveal CPU, memory, disk I/O, and network metrics, enabling rapid performance troubleshooting.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Diagnose Linux Server Bottlenecks in 60 Seconds with 10 Essential Commands

Overview

If a Linux server’s load jumps dramatically and alerts flood your phone, you can identify the performance problem in under a minute by executing a short list of commands recommended by Netflix’s performance engineering team.

Command list

uptime

dmesg | tail

vmstat 1

mpstat -P ALL 1

pidstat 1

iostat -xz 1

free -m

sar -n DEV 1

sar -n TCP,ETCP 1

top

Some of these commands require the sysstat package, while others are provided by procps. Their output follows the USE method (Utilization, Saturation, Errors) to quickly locate bottlenecks.

uptime

$ uptime
23:51:26 up 21:31,  1 user,  load average: 30.02, 26.43, 19.02

The three numbers are the 1‑, 5‑, and 15‑minute average loads, indicating how many processes are waiting for CPU or blocked in uninterruptible I/O. A high 1‑minute load with a lower 15‑minute load suggests a recent spike that needs further investigation.

dmesg | tail

$ dmesg | tail
[1880957.563150] perl invoked oom‑killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child
[1880957.563408] Killed process 18694 (perl) total‑vm:1972392kB, anon‑rss:1953348kB, file‑rss:0kB
[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request.  Check SNMP counters.

The last ten kernel log lines can reveal out‑of‑memory kills or network anomalies such as SYN floods, which are valuable clues during troubleshooting.

vmstat 1

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
34  0    0 200889792 73708 591828    0    0     0     0    5    6 10 96  1  3  0  0
...

Key columns:

r : processes waiting for CPU (if > number of cores, CPU is saturated).

free : free memory in kilobytes.

si/so : swap in/out (non‑zero indicates swapping).

us, sy, id, wa, st : user, system, idle, I/O wait, and stolen CPU time.

High r or low id points to CPU pressure; high wa suggests I/O bottlenecks.

mpstat -P ALL 1

$ mpstat -P ALL 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:38:49 PM  CPU   %usr  %nice  %sys %iowait %irq %soft %steal %guest %gnice %idle
all      98.47   0.00   0.75   0.00   0.00   0.00   0.00   0.00   0.00  0.78
 0       96.04   0.00   2.97   0.00   0.00   0.00   0.00   0.00   0.00  0.99
 1       97.00   0.00   1.00   0.00   0.00   0.00   0.00   0.00   0.00  2.00
 ...

This per‑CPU view highlights any core that is unusually busy, which often indicates a single‑threaded workload monopolizing that CPU.

pidstat 1

$ pidstat 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
07:41:02 PM   UID   PID   %usr %system %guest %CPU   CPU  Command
07:41:03 PM    0     9    0.00   0.94   0.00  0.94   1   rcuos/0
07:41:03 PM    0   4214    5.66   5.66   0.00 11.32  15   mesos‑slave
07:41:03 PM    0   6521 1596.23  1.89   0.00 1598.11 27   java
07:41:03 PM    0   6564 1571.70  7.55   0.00 1579.25 28   java
...

Each line shows a process’s CPU usage. Values above 100 % mean the process is using more than one core; the example shows two Java processes consuming roughly 16 cores together.

iostat -xz 1

$ iostat -xz 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
avg-cpu:  %user %nice %system %iowait %steal %idle
          73.96   0.00   3.73   0.03   0.06  22.21
Device: rrqm/s wrqm/s   r/s   w/s  rkB/s  wkB/s  avgrq‑sz avgqu‑sz  await r_await w_await svctm %util
xvda    0.00   0.00  0.00  0.00  0.00  0.00   4.52    2.08   34.37   0.00   0.00  2.44   0.09
xvdb    0.00   0.01  0.00  1.02 127.97 598.53 145.79   0.00    1.78   0.00   0.28  0.25   0.25
...

Important columns:

r/s, w/s, rkB/s, wkB/s : read/write operations and throughput.

await : average I/O wait time (ms).

avgqu‑sz : average queue length; >1 indicates saturation.

%util : device utilization; >60 % may degrade performance, 100 % means full saturation.

High await or %util points to disk I/O bottlenecks.

free -m

$ free -m
              total   used   free  shared  buffers  cached
Mem:         245998  24545 221453     83       59     541
-/+ buffers/cache: 23944 222053
Swap:            0      0      0

The second line (‑/+ buffers/cache) shows memory that is truly available for applications; Linux uses free RAM for cache, which is reclaimed when needed.

sar -n DEV 1

$ sar -n DEV 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
12:16:48 AM IFACE   rxpck/s txpck/s  rxkB/s  txkB/s  rxcmp/s txcmp/s  rxmcst/s %ifutil
12:16:49 AM eth0    18763.00 5032.00 20686.42 478.30   0.00    0.00    0.00   0.00
12:16:49 AM lo        14.00   14.00   1.36   1.36   0.00    0.00    0.00   0.00

Network interface statistics help determine whether the NIC is saturated. In the example, eth0 handles ~22 MB/s, far below a 1 Gbps link capacity.

sar -n TCP,ETCP 1

$ sar -n TCP,ETCP 1
12:17:20 AM active/s passive/s ise​g/s oseg/s
12:17:20 AM   1.00    0.00 10233.00 18846.00
12:17:20 AM atmptf/s estres/s retrans/s isegerr/s orsts/s
12:17:20 AM   0.00    0.00   0.00    0.00    0.00

Metrics such as active/s (outgoing connections) and retrans/s (TCP retransmissions) indicate whether connection churn or packet loss contributes to performance issues.

top

$ top
top - 00:15:40 up 21:56, 1 user, load average: 31.09, 29.87, 29.92
Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie
%Cpu(s): 96.8 us, 0.4 sy, 2.7 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 25190241+total, 24921688 used, 22698073+free, 60448 buffers
KiB Swap: 0 total, 0 used, 0 free. 554208 cached Mem
...
top

aggregates many of the previous metrics (load, memory, CPU) and allows interactive sorting to find the most resource‑hungry processes. However, because it shows a snapshot, it should be paused or combined with the other commands for a complete picture.

Conclusion

These ten commands provide a rapid, low‑overhead way to diagnose Linux server performance problems. By correlating their outputs—especially high CPU usage from pidstat, I/O saturation from iostat, and memory pressure from free —you can pinpoint the offending subsystem and focus subsequent tuning efforts on the relevant application or hardware component.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceLinuxServerCommandLine
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.