Operations 48 min read

Master Linux Performance Troubleshooting: From top to perf in One Complete Workflow

This guide presents a systematic, four‑dimensional USE methodology for diagnosing Linux performance issues, walking through quick 60‑second overviews with top, vmstat, iostat and ss, then diving into detailed CPU, memory, disk I/O and network investigations using tools such as mpstat, perf, bpf, and flame graphs.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Linux Performance Troubleshooting: From top to perf in One Complete Workflow

Overview

The article presents a practical four‑dimensional USE (Utilization / Saturation / Errors) methodology for Linux system performance troubleshooting. It explains why guessing is insufficient and shows how to locate bottlenecks across CPU, memory, disk I/O and network layers using a reproducible step‑by‑step process.

Environment Requirements

Operating System : CentOS 7+ or Ubuntu 20.04+ (kernel ≥ 4.9 for full perf and eBPF support)

sysstat (v11.0+): provides mpstat, iostat, pidstat, sar perf : version must match the running kernel (usually provided by linux-tools-$(uname -r))

bcc‑tools (v0.12+): eBPF‑based tracing utilities (requires kernel ≥ 4.9)

dstat (v0.7+): combined real‑time monitoring

Step‑by‑Step Procedure

1. Quick 60‑Second Overview

uptime

– shows load average and uptime. top -bn1 – snapshot of CPU, memory, swap and process states. vmstat 1 10 – system‑level snapshot (processes, memory, swap, I/O, CPU). dstat -tcmsdnl --top-cpu --top-io 5 – unified view of CPU, memory, disk, network. iostat -xz 1 5 – detailed disk utilization and latency. ss -s – summary of socket states.

Interpretation tips: high load average with low CPU usage often means many processes are in uninterruptible I/O wait (D state); %util > 80 % indicates a saturated device; wa > 5 % points to I/O bottlenecks.

2. CPU Deep Dive

mpstat -P ALL 1 5

– per‑core utilization. pidstat -u 1 5 – per‑process CPU usage. perf top -p <PID> – real‑time hotspot functions. perf record -p <PID> -g -- sleep 30 – sampling for flame‑graph generation.

# Example: locate hot thread in a Java process
top -Hp 12345 -bn1 | head -20
printf "%x
" 12378   # convert TID to hex for jstack
jstack 12345 > /tmp/jstack.txt
grep -A30 "nid=0x305a" /tmp/jstack.txt

Flame‑graph generation (requires Brendan Gregg’s FlameGraph repo):

# Install FlameGraph
git clone https://github.com/brendangregg/FlameGraph.git /opt/FlameGraph
# Generate flame graph
perf script | /opt/FlameGraph/stackcollapse-perf.pl | /opt/FlameGraph/flamegraph.pl > cpu_flame.svg

3. Memory Investigation

free -h

– focus on the available column rather than free. cat /proc/meminfo – detailed counters (Slab, Dirty, AnonPages, HugePages). smem -rkt -s pss – process‑level proportional set size (more accurate than RSS). pmap -x <PID> – memory map of a specific process. cat /proc/<PID>/status – quick view of VmSize, VmRSS, VmSwap. dmesg | grep -i oom – OOM Killer events.

# Check OOM logs
dmesg | grep -i "out of memory" -A20
# Show OOM score of all processes
for pid in $(ls /proc | grep -E '^[0-9]+$'); do
  name=$(cat /proc/$pid/comm 2>/dev/null)
  score=$(cat /proc/$pid/oom_score 2>/dev/null)
  [ -n "$score" ] && [ $score -gt 100 ] && echo "$pid $name $score"
 done | sort -k3 -rn | head -10

4. Disk I/O Investigation

iostat -xz 1 5

%util, await, queue depth, throughput. iotop -oP -b -n 5 -d 1 – processes with active I/O. pidstat -d 1 5 – per‑process I/O statistics. blktrace -d /dev/sda -o trace -w 10 – low‑level block tracing (short capture). df -hT and df -i – filesystem and inode usage.

When %util > 80 % (or > 95 % on SSD) and await is high, the disk is saturated; use blktrace or deeper iostat analysis.

5. Network Investigation

ss -s

– socket state summary. ss -tn state established – list of established TCP connections. ss -tan state time-wait | wc -l – TIME_WAIT count. sar -n DEV 1 5 – per‑interface bandwidth and errors. nstat -az | grep -i tcp – TCP counters (retransmissions, listen overflows). tcpdump -i eth0 port 80 -c 10000 -w /tmp/capture.pcap – packet capture with limits. ethtool -S eth0 – NIC error statistics. cat /proc/net/softnet_stat – soft‑interrupt distribution per CPU (high second column indicates kernel‑level packet processing overload).

Key interpretation: high rx_dropped or a growing second column in /proc/net/softnet_stat suggests packet‑processing overload; tune net.core.netdev_max_backlog, ring buffers, or enable RPS/RFS.

Automation Scripts

Performance Snapshot Script ( perf_snapshot.sh )

#!/bin/bash
set -euo pipefail
OUTPUT_DIR=${1:-/tmp/perf_snapshot_$(date +%Y%m%d_%H%M%S)}
mkdir -p "$OUTPUT_DIR"
log(){ echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
collect(){ name=$1; shift; log "Collect $name ..." "$@" > "$OUTPUT_DIR/$name.txt" 2>&1 || echo "Failed $name: $?" >> "$OUTPUT_DIR/errors.log"; }
log "Start collection, output: $OUTPUT_DIR"
# Basic info
collect "uname" uname -a
collect "uptime" uptime
collect "date" date '+%Y-%m-%d %H:%M:%S %Z'
collect "hostname" hostname -f
# CPU
collect "top_snapshot" bash -c "top -bn1 | head -50"
collect "mpstat" mpstat -P ALL 1 5
collect "pidstat_cpu" pidstat -u 1 5
collect "pidstat_context" pidstat -w 1 5
# Memory
collect "free" free -h
collect "meminfo" cat /proc/meminfo
collect "slabtop" slabtop -o
collect "smem" bash -c "smem -rkt -s pss 2>/dev/null || echo 'smem not installed'"
# Disk I/O
collect "iostat" iostat -xz 1 5
collect "pidstat_io" pidstat -d 1 5
collect "df" df -hT
collect "df_inode" df -i
# Network
collect "ss_summary" ss -s
collect "ss_established" bash -c "ss -tn state established | head -100"
collect "ss_time_wait" bash -c "ss -tan state time-wait | wc -l"
collect "ss_listen" ss -tlnp
collect "netstat_stats" bash -c "nstat -az 2>/dev/null || netstat -s"
# vmstat & dmesg
collect "vmstat" vmstat 1 10
collect "dmesg_errors" bash -c "dmesg -T 2>/dev/null | tail -100"
collect "dmesg_oom" bash -c "dmesg | grep -i 'oom\|out of memory\|killed process' || echo 'No OOM events'"
# Process list
collect "ps_aux" bash -c "ps aux --sort=-%mem | head -30"
collect "ps_d_state" bash -c "ps aux | awk '$8~/D/' || echo 'No D state processes'"
# Recent journal / syslog
collect "journal_recent" bash -c "journalctl --since '10 minutes ago' --no-pager 2>/dev/null | tail -200 || tail -200 /var/log/syslog 2>/dev/null || echo 'No syslog access'"
log "Collection finished. Files:"; ls -la "$OUTPUT_DIR"
# Package
tar czf "${OUTPUT_DIR}.tar.gz" -C "$(dirname "$OUTPUT_DIR")" "$(basename "$OUTPUT_DIR")"
log "Packaged: ${OUTPUT_DIR}.tar.gz (size: $(du -sh "${OUTPUT_DIR}.tar.gz" | awk '{print $1}')"

Continuous Monitoring Script ( perf_monitor.sh )

#!/bin/bash
set -euo pipefail
INTERVAL=${1:-5}
DURATION=${2:-3600}
OUTPUT=/tmp/perf_monitor_$(date +%Y%m%d_%H%M%S).csv
echo "timestamp,load1,load5,load15,cpu_us,cpu_sy,cpu_wa,cpu_st,mem_used_pct,swap_used_mb,disk_util,net_rx_kb,net_tx_kb,tcp_estab,tcp_tw,context_switch,interrupts" > "$OUTPUT"
END_TIME=$((SECONDS + DURATION))
while [ $SECONDS -lt $END_TIME ]; do
  TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
  read LOAD1 LOAD5 LOAD15 <<<$(awk '{print $1,$2,$3}' /proc/loadavg)
  read CPU_US CPU_SY CPU_WA CPU_ST <<<$(vmstat 1 2 | tail -1 | awk '{print $13,$14,$16,$17}')
  MEM_USED_PCT=$(free | awk '/Mem:/{printf "%.1f",($3/$2)*100}')
  SWAP_USED=$(free -m | awk '/Swap:/{print $3}')
  DISK_UTIL=$(iostat -xz 1 2 | awk '/^[a-z]/{if($NF+0>max)max=$NF+0} END{print max+0}')
  read NET_RX NET_TX <<<$(sar -n DEV 1 1 2>/dev/null | awk '/Average:/ && !/lo/{print $5,$6}' || echo "0 0")
  TCP_ESTAB=$(ss -tn state established 2>/dev/null | wc -l)
  TCP_TW=$(ss -tan state time-wait 2>/dev/null | wc -l)
  read CS INTR <<<$(vmstat 1 2 | tail -1 | awk '{print $12,$11}')
  echo "$TIMESTAMP,$LOAD1,$LOAD5,$LOAD15,$CPU_US,$CPU_SY,$CPU_WA,$CPU_ST,$MEM_USED_PCT,$SWAP_USED,$DISK_UTIL,$NET_RX,$NET_TX,$TCP_ESTAB,$TCP_TW,$CS,$INTR" >> "$OUTPUT"
  sleep $INTERVAL
done
echo "Monitoring finished, data file: $OUTPUT (lines: $(wc -l < "$OUTPUT"))"

Real‑World Cases

Case 1 – Java CPU 100 % caused by ReDoS regex

Identify the Java process: top -bn1 | grep java.

Find the hottest thread: top -Hp <PID> -bn1 | head -20.

Convert thread ID to hex for jstack: printf "%x\n" <TID>.

Dump stack traces: jstack <PID> > /tmp/jstack.txt and search for the hex ID.

The stack shows the thread stuck in a complex email‑validation regular expression, confirming a ReDoS vulnerability.

Solution: restart the service, replace the regex or add a timeout, and add regex ReDoS testing to CI.

Case 2 – MySQL I/O Wait

Confirm I/O saturation: iostat -xz 1 3 (e.g., %util 95 %, w_await 45 ms).

Identify the offending process: iotop -oP -b -n 3 -d 1 (shows mysqld writing 180 MB/s).

Inspect MySQL workload: mysql -e "SHOW PROCESSLIST\G" and the slow‑query log.

Analyze with pt-query-digest and add a composite index on (status, create_time).

Increase InnoDB buffer pool, tune innodb_io_capacity, and restart MySQL.

Case 3 – High Load Average with Low CPU Usage

Check for D‑state processes: vmstat 1 5 (column b high) and ps aux | awk '$8~/^D/{print}'.

Identify I/O bottleneck: iostat -xz 1 3 (high %util, large await).

Find the culprit: iotop -oP -b -n 3 (many rsync processes).

Root cause: simultaneous backup jobs from dozens of servers saturate the disk.

Fix: stagger backups, add --bwlimit to rsync, or upgrade storage.

Case 4 – Network Packet Loss

Verify loss: ping -c 100 10.0.0.2 (e.g., 5 % loss).

Check NIC errors: ethtool -S eth0 | grep -E "drop|error|fifo" (high rx_dropped).

Inspect ring buffer size: ethtool -g eth0 and enlarge with ethtool -G eth0 rx 4096.

Analyze soft‑interrupt distribution: cat /proc/net/softnet_stat (second column > 0 indicates budget exhaustion).

Enable RPS/RFS: echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus.

Retest: ping shows 0 % loss.

Best Practices and Pitfalls

Kernel Parameter Tuning

vm.swappiness = 10

– reduces swap usage; set to 1 for databases. vm.dirty_ratio = 10 and vm.dirty_background_ratio = 5 – limit dirty page buildup. net.core.somaxconn = 65535 and net.ipv4.tcp_max_syn_backlog = 65535 – enlarge connection queues for web services. net.core.netdev_max_backlog = 50000 – prevent NIC receive‑queue overflow on high‑throughput NICs. net.ipv4.tcp_tw_reuse = 1 – allow TIME_WAIT reuse in non‑NAT environments. fs.file-max = 2097152 – raise global file descriptor limit.

Persist settings in /etc/sysctl.d/99-performance.conf and apply with sysctl -p.

Security Hardening

Restrict perf to root: echo 2 > /proc/sys/kernel/perf_event_paranoid.

Audit execution of performance tools:

auditctl -a always,exit -F path=/usr/bin/perf -F perm=x -k perf_usage

.

Limit tcpdump captures with -c or -G and delete pcap files after analysis.

Constant Monitoring Stack

Deploy node_exporter on each host, scrape with Prometheus, and visualise in Grafana. Example node_exporter systemd unit is omitted for brevity.

# /etc/prometheus/rules/node_alerts.yml
groups:
- name: node_alerts
  rules:
  - alert: HighCpuUsage
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "CPU usage too high on {{ $labels.instance }}"
      description: "CPU usage {{ $value | printf \"%.1f\" }}% for more than 5 minutes"
  - alert: HighMemoryUsage
    expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Memory usage too high on {{ $labels.instance }}"
      description: "Available memory below 10% ({{ $value | printf \"%.1f\" }}%)"
  - alert: HighLoadAverage
    expr: node_load1 / count without(cpu,mode) (node_cpu_seconds_total{mode="idle"}) > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Load average exceeds twice the CPU count on {{ $labels.instance }}"
      description: "Load average is high for 5 minutes"
  - alert: HighDiskUtilization
    expr: rate(node_disk_io_time_seconds_total[5m]) * 100 > 90
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Disk I/O saturation on {{ $labels.instance }}"
      description: "Device {{ $labels.device }} utilization > 90%"
  - alert: DiskSpaceRunningOut
    expr: (1 - node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes) * 100 > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Disk space low on {{ $labels.instance }}"
      description: "Mount {{ $labels.mountpoint }} usage {{ $value | printf \"%.1f\" }}%"
  - alert: HighNetworkErrors
    expr: rate(node_network_receive_errs_total[5m]) > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Network receive errors on {{ $labels.instance }}"
      description: "Device {{ $labels.device }} error rate {{ $value | printf \"%.1f\" }}/s"
  - alert: SwapUsageIncreasing
    expr: rate(node_memory_SwapFree_bytes[10m]) < -1048576
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Swap usage increasing on {{ $labels.instance }}"
      description: "Swap is being consumed, possible memory leak"

Monitoring Indicators

CPU Utilization (us+sy) < 70 % (alert > 85 % for 5 min).

Load Average (1 min) < CPU cores (alert > 2 × CPU cores for 5 min).

Memory Available > 20 % (alert < 10 %).

Swap Used = 0 (alert > 0 and growing).

Disk %util < 70 % (alert > 90 % for 1 min; SSD can tolerate up to 95 %).

IO await < 10 ms (HDD) / < 2 ms (SSD) (alert > 30 ms HDD, > 10 ms SSD).

Network %ifutil < 70 % (alert > 85 %).

TCP retransmission rate < 0.1 % (alert > 1 %).

TIME_WAIT count < 20 000 (alert > 50 000).

Context switches per second < 50 000 (alert > 100 000).

Conclusion

The USE methodology combined with a rapid 60‑second global snapshot and a layered deep‑dive toolbox (mpstat, pidstat, perf, eBPF, flame graphs, iostat, ss, etc.) enables engineers to locate and resolve Linux performance problems within minutes. Proper kernel tuning, security hardening, and continuous Prometheus‑based monitoring turn reactive troubleshooting into proactive reliability engineering.

TroubleshootingperfSystem
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.