Master Linux System Performance: Essential Monitoring Tools & Optimization Secrets
This comprehensive guide walks you through Linux system performance monitoring, covering essential tools such as top, htop, vmstat, iostat, and sar, and provides detailed explanations, command examples, scripts, and practical optimization strategies for CPU, memory, disk, and network resources.
Linux System Resource Exhaustion? Veteran Ops Performance Optimization Secrets Revealed!
1. Overview of System Performance Monitoring
1.1 Importance of Performance Monitoring
In modern IT operations, system performance monitoring is a key step to ensure service stability. Effective monitoring can:
Preventive Maintenance : Identify potential risks before problems occur.
Fast Fault Localization : Quickly pinpoint performance bottlenecks through monitoring data.
Capacity Planning : Reasonably plan resources based on historical data.
Cost Optimization : Avoid over‑provisioning and resource waste.
User Experience : Ensure application response time and availability.
1.2 Monitoring Dimensions
Linux system performance monitoring mainly focuses on the following dimensions:
CPU Performance
CPU usage, load average
User and kernel time distribution
Interrupt handling time
Context switch frequency
Memory Performance
Memory usage and available memory
Cache and buffer status
Swap activity
Memory leak detection
Disk I/O Performance
Disk read/write speed and IOPS
Disk utilization and queue length
Filesystem performance
Disk error rate
Network Performance
Network bandwidth usage
Network latency and packet loss
Connection and concurrency counts
Network error statistics
1.3 Monitoring Tool Classification
Real‑time Monitoring Tools
top, htop: real‑time process and resource monitoring
vmstat: virtual memory, process, CPU activity
netstat: network connection status
Historical Data Analysis Tools
sar: system activity reporter
atop: advanced real‑time monitoring
nmon: performance monitoring with export capability
Professional Monitoring Tools
Nagios, Zabbix: enterprise‑level monitoring
Grafana: data visualization
Prometheus: time‑series database monitoring
2. Detailed Guide to the top Command
2.1 Basic Usage
The top command is the most commonly used real‑time system monitoring tool in Linux, displaying running processes and resource usage.
Basic Syntax
top [options]Common Options
# Basic usage
top
# Set refresh interval (seconds)
top -d 2
# Set number of iterations
top -n 5
# Batch mode (suitable for scripts)
top -b
# Show processes of a specific user
top -u username
# Show specific PIDs
top -p 1234,56782.2 Interpreting the top Display
System Information Area
top - 14:25:30 up 10 days, 2:45, 3 users, load average: 0.15, 0.25, 0.20First line explanation:
Current time: 14:25:30
Uptime: 10 days 2 hours 45 minutes
Number of logged‑in users: 3
System load averages for 1, 5, and 15 minutes
Second line explanation:
Total processes: 245
Running: 1
Sleeping: 244
Stopped: 0
Zombie: 0
Third line explanation: %us: User‑mode CPU time percentage %sy: Kernel‑mode CPU time percentage %ni: Nice‑adjusted user‑mode CPU time %id: Idle CPU time percentage %wa: I/O wait time percentage %hi: Hardware interrupt CPU time %si: Software interrupt CPU time %st: Time stolen by other virtual machines
Fourth & Fifth line explanation:
Memory and swap usage total: Total memory free: Free memory used: Used memory buff/cache: Buffers and cache avail Mem: Available memory
Process Information Area
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 root 20 0 162384 4892 3456 S 2.3 0.1 0:12.34 systemd
5678 www 20 0 1234567 89012 45678 S 1.5 1.1 1:23.45 apache2Field meanings: PID: Process ID USER: Process owner PR: Process priority NI: Nice value VIRT: Virtual memory size RES: Resident (physical) memory size SHR: Shared memory size S: Process state (S=sleeping, R=running, Z=zombie) %CPU: CPU usage %MEM: Memory usage TIME+: Total CPU time consumed COMMAND: Command name
2.3 Interactive Operations
h: Help q: Quit Space: Immediate refresh k: Kill a process (requires PID) r: Change process priority f: Add/remove displayed fields o: Change field order z: Toggle color display
Sorting and Filtering
P: Sort by CPU usage M: Sort by memory usage T: Sort by runtime N: Sort by PID u: Show processes of a specific user n: Set number of displayed processes i: Hide idle and zombie processes
2.4 Advanced Usage
Batch Mode
# Output to file
top -b -n 1 > system_status.txt
# Show only specific processes
top -b -n 1 -p 1234
# Monitoring script example
#!/bin/bash
while true; do
echo "=== $(date) ===" >> monitor.log
top -b -n 1 | head -20 >> monitor.log
sleep 60
doneCustom Display
# Create custom configuration file
echo "RCfile for \"top with windows\"" > ~/.toprc
echo "Id:i, Mode_altscr=0, Mode_irixps=1" >> ~/.toprc
# Use custom configuration
top -c ~/.toprc3. Detailed Guide to the htop Command
3.1 Introduction to htop
htopis an enhanced version of top, offering a friendlier interface and more features.
Installation
# Ubuntu/Debian
sudo apt-get install htop
# CentOS/RHEL
sudo yum install htop
# or
sudo dnf install htop3.2 htop Features
Interface Advantages
Colorful display, more intuitive
Mouse support
Tree view of processes
Real‑time CPU and memory bar graphs
Horizontal scrolling to view full command lines
Functional Advantages
# Basic usage
htop
# Show process tree
htop -t
# Show processes of a specific user
htop -u username
# Batch mode
htop -b3.3 Interactive Operations in htop
Basic Operations
F1 # Help
F2 # Setup
F3 # Search process
F4 # Filter
F5 # Toggle tree view
F6 # Sort
F7 # Decrease nice value (increase priority)
F8 # Increase nice value (decrease priority)
F9 # Kill process
F10 # QuitAdvanced Operations
Space # Mark process
U # Unmark all
c # Mark process and its children
K # Hide kernel threads
H # Hide user threads
p # Toggle full path display4. Detailed Guide to the vmstat Command
4.1 Basics
vmstat(Virtual Memory Statistics) is used to monitor virtual memory, processes, and CPU activity.
Basic Syntax
vmstat [options] [interval] [count]Common Options
# Show overall averages since boot
vmstat
# Show every 2 seconds, 5 times
vmstat 2 5
# Show active and inactive memory
vmstat -a
# Show disk statistics
vmstat -d
# Show per‑device statistics
vmstat -p /dev/sda1
# Show detailed statistics
vmstat -s
# Show memory detailed statistics
vmstat -m4.2 Output Interpretation
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 1234567 89012 345678 0 0 10 25 156 234 5 2 93 0 0proc fields : r: Processes waiting for runtime b: Processes in uninterruptible sleep
memory fields (KB) : swpd: Used virtual memory free: Free memory buff: Buffer memory cache: Cached memory
swap fields : si: Swap in (KB/s) so: Swap out (KB/s)
I/O fields : bi: Blocks received from a block device (blocks/s) bo: Blocks sent to a block device (blocks/s)
system fields : in: Interrupts per second cs: Context switches per second
CPU fields : us: User‑mode CPU time sy: Kernel‑mode CPU time id: Idle CPU time wa: I/O wait st: Stolen time (virtualized environments)
5. Detailed Guide to the iostat Command
5.1 Basics
iostatis an important tool for monitoring disk I/O performance, part of the sysstat package.
Installation
# Ubuntu/Debian
sudo apt-get install sysstat
# CentOS/RHEL
sudo yum install sysstat
# or
sudo dnf install sysstatBasic Syntax
iostat [options] [interval] [count]5.2 Common Options
Basic Options
# Basic display
iostat
# Extended statistics
iostat -x
# CPU statistics only
iostat -c
# Disk statistics only
iostat -d
# NFS statistics
iostat -n
# Human‑readable format
iostat -h
# Every 2 seconds, 5 times
iostat -x 2 5Advanced Options
# Specific device
iostat -x sda
# Show all devices, including unused
iostat -x -z
# Detailed NFS stats
iostat -n -x
# Display in MB
iostat -m
# Display in KB
iostat -k5.3 Output Interpretation
CPU Statistics
avg-cpu: %user %nice %system %iowait %steal %idle
2.50 0.00 1.25 0.25 0.00 96.00 %user: User‑mode CPU usage %nice: Nice‑adjusted user‑mode CPU usage %system: Kernel‑mode CPU usage %iowait: I/O wait percentage %steal: Time stolen by other VMs %idle: Idle CPU percentage
Disk Statistics
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 1.25 2.50 50.00 100.00 0.10 0.20 7.41 7.41 8.00 12.00 0.05 40.00 40.00 4.00 1.50 r/s: Reads per second w/s: Writes per second rkB/s: Kilobytes read per second wkB/s: Kilobytes written per second rrqm/s: Read requests merged per second wrqm/s: Write requests merged per second %rrqm, %wrqm: Merge percentages r_await, w_await: Average read/write wait time (ms) aqu-sz: Average queue length svctm: Average service time (ms) %util: Device utilization percentage
6. Other Important Monitoring Tools
6.1 sar Command
sar(System Activity Reporter) is a powerful tool in the sysstat package for collecting and reporting system activity.
Basic Usage
# CPU usage
sar -u 2 5
# Memory usage
sar -r 2 5
# Disk I/O
sar -d 2 5
# Network statistics
sar -n DEV 2 5
# Load average
sar -q 2 5
# Swap usage
sar -S 2 5Historical Data Viewing
# Today's data
sar -u
# Specific date
sar -u -f /var/log/sysstat/sa15
# Specific time range
sar -u -s 09:00:00 -e 18:00:00Data Collection Configuration
# Edit crontab
sudo vi /etc/crontab
# Add collection tasks
*/10 * * * * root /usr/lib/sysstat/sa1
2,12,22,32,42,52 * * * * root /usr/lib/sysstat/sa1
59 23 * * * root /usr/lib/sysstat/sa2 -A
# Check sysstat service status
sudo systemctl status sysstat
sudo systemctl enable sysstat6.2 netstat Command
netstatdisplays network connections, routing tables, interface status, etc.
Basic Usage
# Show all connections
netstat -a
# Show listening ports
netstat -l
# Show TCP connections
netstat -t
# Show UDP connections
netstat -u
# Show process info
netstat -p
# Show numeric addresses
netstat -n
# Combined
netstat -tulnpPractical Examples
# Check specific port
netstat -tulnp | grep :80
# Count connection states
netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c
# Interface statistics
netstat -i
# Show routing table
netstat -r6.3 ps Command
psdisplays information about currently running processes.
Basic Usage
# Show processes of current terminal
ps
# Show all processes
ps aux
# Show process tree
ps auxf
# Show processes of a user
ps -u username
# Show specific PID
ps -p 1234
# Detailed view
ps -efProcess Analysis Script
#!/bin/bash
echo "=== System Process Analysis $(date) ==="
# Top 10 CPU‑intensive processes
echo "1. Top CPU processes:"
ps aux --sort=-%cpu | head -11
# Top 10 memory‑intensive processes
echo "2. Top memory processes:"
ps aux --sort=-%mem | head -11
# Show zombie processes
echo "3. Zombie processes:"
ps aux | awk '$8 ~ /^Z/ {print $2, $11}'
# Process statistics
echo "4. Process statistics:"
ps aux --no-headers | wc -l | xargs echo "Total processes:"
ps aux --no-headers | awk '$8 ~ /^Z/' | wc -l | xargs echo "Zombie processes:"6.4 free Command
freeshows system memory usage.
Basic Usage
# Simple display
free
# Human‑readable
free -h
# Show in MB
free -m
# Show in GB
free -g
# Continuous monitoring every 2 seconds
free -s 2
# Detailed information
free -wMemory Usage Analysis Script
#!/bin/bash
echo "=== Memory Usage Analysis $(date) ==="
# Basic memory info
echo "1. Memory status:"
free -h
# Memory usage percentage
total_mem=$(free | grep Mem | awk '{print $2}')
used_mem=$(free | grep Mem | awk '{print $3}')
mem_usage=$((used_mem * 100 / total_mem))
echo "2. Memory usage: ${mem_usage}%"
# Swap usage
swap_total=$(free | grep Swap | awk '{print $2}')
swap_used=$(free | grep Swap | awk '{print $3}')
if [ $swap_total -gt 0 ]; then
swap_usage=$((swap_used * 100 / swap_total))
echo "3. Swap usage: ${swap_usage}%"
else
echo "3. No swap configured"
fi
# Memory usage warning
if [ $mem_usage -gt 85 ]; then
echo "Warning: High memory usage!"
fi
if [ $swap_used -gt 0 ]; then
echo "Warning: System is using swap!"
fi7. Advanced Monitoring Tools
7.1 atop Command
atopis an advanced system monitoring tool that provides more detailed information than top.
Installation
# Ubuntu/Debian
sudo apt-get install atop
# CentOS/RHEL
sudo yum install atop
# or
sudo dnf install atopBasic Usage
# Simple display
atop
# Refresh every 2 seconds
atop 2
# Show disk info
atop -d
# Show memory info
atop -m
# Show network info
atop -n
# Show process info
atop -pLog Analysis
# View historical data
atop -r /var/log/atop/atop_20231215
# Analyze a specific time range
atop -r /var/log/atop/atop_20231215 -b 09:00 -e 18:00
# Export data
atop -P CPU,MEM,DSK -r /var/log/atop/atop_20231215 > analysis.txt7.2 nmon Command
nmon(Nigel's Monitor) is a powerful performance monitoring tool.
Installation
# Ubuntu/Debian
sudo apt-get install nmon
# CentOS/RHEL
sudo yum install nmon
# or
sudo dnf install nmonBasic Usage
# Start nmon
nmon
# Keyboard shortcuts inside nmon
c # CPU usage
m # Memory usage
d # Disk I/O
n # Network stats
t # Process info
r # Resource usage
q # QuitData Collection
# Collect data to a file
nmon -f -s 30 -c 120
# Convert to Excel format
nmon2rrd filename.nmon7.3 dstat Command
dstatis a versatile system resource statistics tool.
Installation
# Ubuntu/Debian
sudo apt-get install dstat
# CentOS/RHEL
sudo yum install dstat
# or
sudo dnf install dstatBasic Usage
# Simple display
dstat
# Show CPU, memory, disk, network
dstat -cdmn
# Show top CPU and memory consuming processes
dstat --top-cpu --top-mem
# Show specific disks
dstat -d -D sda,sdb
# Show specific network interfaces
dstat -n -N eth0,eth1
# Custom interval (every 5 seconds, 12 times)
dstat 5 12Plugins
# List available plugins
dstat --list
# Use plugins
dstat --tcp --udp
dstat --disk-util --disk-tps
dstat --proc-count --sys-load8. Performance Tuning Strategies
8.1 CPU Performance Tuning
CPU Utilization Optimization
# View CPU info
lscpu
cat /proc/cpuinfo
# View CPU usage
top -p 1 -n 1 | grep "Cpu(s)"
vmstat 1 5
# Optimize CPU affinity
taskset -c 0,1 command
taskset -p 0x3 PID
# Adjust process priority
nice -n 10 command
renice -n 5 -p PID
# Enable performance governor
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governorLoad Balancing Optimization
# View system load
uptime
cat /proc/loadavg
# Analyze high load causes (CPU‑bound, I/O wait, too many processes)
# Optimization strategies
# 1. Distribute CPU‑intensive tasks
# 2. Optimize I/O performance
# 3. Limit concurrent processes8.2 Memory Performance Tuning
Memory Usage Optimization
# View detailed memory info
cat /proc/meminfo
free -h
vmstat -s
# Drop caches
echo 1 > /proc/sys/vm/drop_caches # page cache
echo 2 > /proc/sys/vm/drop_caches # dentries and inodes
echo 3 > /proc/sys/vm/drop_caches # all caches
# Adjust swap
swapon -s
swapoff /swapfile
mkswap /swapfile
swapon /swapfile
# Tune memory allocation
echo 10 > /proc/sys/vm/swappiness # reduce swap usage
echo 1 > /proc/sys/vm/overcommit_memory # allow memory overcommitMemory Leak Detection
# Monitor memory usage trends
ps aux --sort=-%mem | head -10
pmap -x PID
# Use valgrind for leak detection
valgrind --tool=memcheck --leak-check=full ./program
# System‑level monitoring
watch -n 1 'cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|Buffers|Cached)"'8.3 Disk I/O Performance Tuning
Disk Performance Optimization
# View disk info
lsblk
fdisk -l
df -h
# Optimize I/O scheduler
echo deadline > /sys/block/sda/queue/scheduler
echo noop > /sys/block/sda/queue/scheduler
# Adjust I/O priority
ionice -c 1 -n 4 command # real‑time
ionice -c 2 -n 7 command # best‑effort
ionice -c 3 command # idle
# Filesystem tuning (ext4 example)
tune2fs -o journal_data_writeback /dev/sda1
mount -o noatime,nodiratime /dev/sda1 /mount/point
# XFS example
mount -o noatime,nodiratime,nobarrier /dev/sda1 /mount/pointRAID Performance Optimization
# Check RAID status
cat /proc/mdstat
mdadm --detail /dev/md0
# Optimize stripe size
mdadm --create /dev/md0 --level=0 --raid-devices=2 --chunk=64 /dev/sda1 /dev/sdb1
# Adjust read‑ahead
blockdev --setra 8192 /dev/sda8.4 Network Performance Tuning
Network Configuration Optimization
# View interface status
ip addr show
ethtool eth0
# Adjust kernel buffers
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 87380 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf
# Apply changes
sysctl -p
# Optimize TCP parameters
echo 'net.ipv4.tcp_window_scaling = 1' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_congestion_control = cubic' >> /etc/sysctl.confNetwork Monitoring Script
#!/bin/bash
INTERFACE="eth0"
LOG_FILE="/var/log/network_monitor.log"
echo "=== Network Performance Monitoring $(date) ===" >> $LOG_FILE
# Monitor bandwidth
sar -n DEV 1 5 | grep $INTERFACE >> $LOG_FILE
# Monitor connections
netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c >> $LOG_FILE
# Monitor errors
cat /proc/net/dev | grep $INTERFACE >> $LOG_FILE9. Comprehensive Monitoring Scripts
9.1 System Performance Monitoring Script
#!/bin/bash
# Comprehensive system performance monitoring script
LOG_DIR="/var/log/performance"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $LOG_DIR
echo "=== System Performance Report $(date) ===" > $LOG_DIR/report_$DATE.txt
# CPU monitoring
echo "1. CPU usage:" >> $LOG_DIR/report_$DATE.txt
top -b -n 1 | head -5 >> $LOG_DIR/report_$DATE.txt
vmstat 1 3 >> $LOG_DIR/report_$DATE.txt
# Memory monitoring
echo "2. Memory usage:" >> $LOG_DIR/report_$DATE.txt
free -h >> $LOG_DIR/report_$DATE.txt
vmstat -s | grep -E "(total|used|free|buffer|cache)" >> $LOG_DIR/report_$DATE.txt
# Disk monitoring
echo "3. Disk usage:" >> $LOG_DIR/report_$DATE.txt
df -h >> $LOG_DIR/report_$DATE.txt
iostat -x 1 3 >> $LOG_DIR/report_$DATE.txt
# Network monitoring
echo "4. Network status:" >> $LOG_DIR/report_$DATE.txt
netstat -i >> $LOG_DIR/report_$DATE.txt
ss -tuln >> $LOG_DIR/report_$DATE.txt
# Process monitoring
echo "5. Process status:" >> $LOG_DIR/report_$DATE.txt
ps aux --sort=-%cpu | head -10 >> $LOG_DIR/report_$DATE.txt
ps aux --sort=-%mem | head -10 >> $LOG_DIR/report_$DATE.txt
echo "Report generated: $LOG_DIR/report_$DATE.txt"9.2 Performance Alert Script
#!/bin/bash
# Performance alert script
CPU_THRESHOLD=80
MEMORY_THRESHOLD=85
DISK_THRESHOLD=85
LOAD_THRESHOLD=5.0
MAIL_TO="[email protected]"
HOSTNAME=$(hostname)
# CPU usage
cpu_usage=$(top -b -n 1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
if (( $(echo "$cpu_usage > $CPU_THRESHOLD" | bc -l) )); then
echo "Warning: $HOSTNAME CPU usage high: $cpu_usage%" | mail -s "CPU Alert" $MAIL_TO
fi
# Memory usage
memory_usage=$(free | grep Mem | awk '{printf "%.1f", $3*100/$2}')
if (( $(echo "$memory_usage > $MEMORY_THRESHOLD" | bc -l) )); then
echo "Warning: $HOSTNAME memory usage high: $memory_usage%" | mail -s "Memory Alert" $MAIL_TO
fi
# Disk usage (root partition)
disk_usage=$(df -h | awk '$NF=="/" {print $5}' | sed 's/%//')
if [ $disk_usage -gt $DISK_THRESHOLD ]; then
echo "Warning: $HOSTNAME disk usage high: $disk_usage%" | mail -s "Disk Alert" $MAIL_TO
fi
# Load average
load_avg=$(uptime | awk -F'load average:' '{print $2}' | cut -d',' -f1 | tr -d ' ')
if (( $(echo "$load_avg > $LOAD_THRESHOLD" | bc -l) )); then
echo "Warning: $HOSTNAME load average high: $load_avg" | mail -s "Load Alert" $MAIL_TO
fi10. Summary
10.1 Monitoring Skills Mastery
Core Tools : Proficient with top, htop, vmstat, iostat, sar, etc.
Metric Understanding : Deep grasp of CPU, memory, disk, network indicators.
Bottleneck Analysis : Quickly identify performance issues from monitoring data.
Monitoring System : Build a complete monitoring and alerting framework.
10.2 Tuning Skill Advancement
Targeted Optimization : Apply specific tweaks based on monitoring results.
Preventive Maintenance : Use continuous monitoring to avoid problems.
Capacity Planning : Use historical data for resource planning.
Fault Handling : Rapidly locate and resolve performance incidents.
10.3 Best‑Practice Principles
Continuous Monitoring : 24/7 monitoring system.
Data‑Driven Decisions : Base actions on real metrics.
Proactive Optimization : Regular performance tuning.
Documentation : Record monitoring and tuning processes.
Knowledge Sharing : Share insights with the team.
10.4 Advanced Learning Suggestions
Deep Dive : Study Linux kernel internals and system calls.
Tool Expansion : Learn Nagios, Zabbix, Prometheus, etc.
Automation : Integrate monitoring data into automated ops.
Performance Tuning : Explore advanced tuning techniques.
Continuous Learning : Keep up with new monitoring technologies.
By systematically learning and practicing, you can build an efficient, stable, and reliable Linux monitoring and optimization system, providing strong technical support for enterprise IT infrastructure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
