Master Linux Ops: From Essential Commands to Deep System Optimization
This comprehensive guide walks Linux administrators through advanced command usage, performance metric tuning, real‑world troubleshooting cases, and a practical checklist, empowering readers to move from reactive firefighting to proactive system optimization and reliability.
Linux Operations Advanced: From Basic Commands to System Optimization
If you’ve ever been woken up at 2 a.m. by a timeout alert, struggled to remember countless Linux commands, or felt lost when trying to tune a server, this article is for you.
What you will gain:
20 high‑frequency ops commands with advanced usage (e.g., top -Hp, strace -c)
Five key performance‑metric optimization strategies covering CPU, memory, disk I/O, network, and process management
Three real‑world production cases with full troubleshooting steps and reusable scripts
A ready‑to‑use system‑optimization checklist for daily, weekly, and monthly maintenance
1. Advanced Basic Commands: From “Knowing” to “Mastering”
1.1 CPU investigation trio: top , htop , pidstat
Most people use top only to view overall CPU usage; experts drill down to thread‑level details.
Key command 1: Locate high‑CPU threads
# Find the PID of the high‑CPU process (e.g., 12345)
top -c
# Show thread‑level CPU usage for that PID
top -Hp 12345
# Convert thread ID to hex for jstack matching
printf "%x
" 12356Lesson: Looking only at process‑level CPU can miss a Java GC thread that hogs CPU, leading to hours of wasted investigation. Always use the -Hp flag.
Key command 2: Real‑time resource monitoring for a specific process
# Refresh every 2 seconds for PID 12345
pidstat -u -r -d -t -p 12345 21.2 Memory investigation beyond free -h
Key command 3: Pinpoint memory‑leaking processes
# Show memory map and highlight abnormal growth
pmap -x 12345 | tail -5
# Continuously log memory growth every 5 seconds
while true; do
date >> mem_monitor.log
ps aux | grep 12345 | grep -v grep >> mem_monitor.log
sleep 5
doneLesson: free -h only shows total memory; using smem -rs swap -p can quickly reveal the offending process.
1.3 Disk I/O deep dive with iotop
Key command 4: Find hidden I/O killers
# Show only processes with I/O activity, refresh every 2 seconds
iotop -oP -d 2
# Inspect a specific process’s I/O details
cat /proc/12345/ioPractical tip: A MySQL slowdown was traced to log‑file writes saturating disk I/O; changing innodb_flush_log_at_trx_commit from 1 to 2 boosted write performance five‑fold.
2. System Optimization in Practice: From Theory to Implementation
2.1 CPU optimization – beyond nice values
Solution 1: CPU affinity binding
# Bind nginx workers to specific cores
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;
# Verify binding
taskset -cp $(pgrep nginx)Binding reduces cache misses and can improve performance by ~15%.
Solution 2: Interrupt load balancing
# View current interrupt distribution
cat /proc/interrupts
# Bind NIC interrupt to CPU 2
echo 2 > /proc/irq/24/smp_affinity2.2 Memory optimization – scientific tuning
Solution 3: Adjust swappiness
# Check current value
cat /proc/sys/vm/swappiness
# Temporary change (lost on reboot)
echo 10 > /proc/sys/vm/swappiness
# Permanent change
echo "vm.swappiness = 10" >> /etc/sysctl.conf
sysctl -pDo not set swappiness to 0; it can cause OOM kills under memory pressure.
2.3 Network optimization – boosting concurrent connections
Solution 4: TCP parameter tuning
# Increase TCP listen queue
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
# Raise SYN backlog
echo 'net.ipv4.tcp_max_syn_backlog = 8192' >> /etc/sysctl.conf
# Reuse TIME_WAIT sockets (safe in production)
echo 'net.ipv4.tcp_tw_reuse = 1' >> /etc/sysctl.conf
# Do NOT enable tcp_tw_recycle in NAT environments
# Apply immediately
sysctl -p3. Real‑World Cases: Full Fault‑Diagnosis Walkthroughs
Case 1: CPU shows 100 % but no offending process
Symptoms: Monitoring reports 100 % CPU, top shows no high‑CPU process.
# Step 1: Look for hidden processes
ps aux | awk '{print $3}' | sort -rn | head -10
# Step 2: Check kernel threads
ps aux | grep "\[.*\]"
# Step 3: Examine I/O wait
top # notice high wa value
# Step 4: Identify I/O bottleneck
iotop -oP # rsync backup process is the culpritRoot cause: An rsync backup generated massive I/O wait, making CPU appear saturated.
Solution:
Switch rsync to incremental mode: rsync -avz --delete Limit bandwidth: --bwlimit=10240 (10 MB/s)
Schedule backups during off‑peak hours
Case 2: Memory leak causing cascade failures
Full detection script (run as root):
#!/bin/bash
# check_memory_leak.sh – detect processes with significant memory growth
echo "=== Memory Leak Detection Script ==="
echo "Monitoring memory usage for 60 seconds..."
tmpfile=$(mktemp)
# First sample
ps aux --sort=-%mem | head -20 | awk '{print $2,$4,$11}' > $tmpfile.1
sleep 60
# Second sample
ps aux --sort=-%mem | head -20 | awk '{print $2,$4,$11}' > $tmpfile.2
echo -e "
=== Processes with significant memory growth ==="
echo "PID MEM_BEFORE MEM_AFTER GROWTH COMMAND"
while read pid mem1 cmd1; do
mem2=$(grep "^$pid " $tmpfile.2 | awk '{print $2}')
if [ -n "$mem2" ]; then
growth=$(echo "$mem2 - $mem1" | bc)
if (( $(echo "$growth > 0.5" | bc -l) )); then
printf "%-6s %-10s %-10s %-7s %s
" "$pid" "$mem1%" "$mem2%" "+$growth%" "$cmd1"
fi
fi
done < $tmpfile.1
rm -f $tmpfile*
echo -e "
=== Recommendation ==="
echo "For suspicious processes, use: pmap -x PID"
echo "Or check memory maps: cat /proc/PID/smaps"Usage:
chmod +x check_memory_leak.sh
./check_memory_leak.sh4. System‑Optimization Checklist (daily/weekly/monthly)
Daily checks
CPU load: uptime – keep 1/5/15‑minute load below 0.7 × CPU count
Memory usage: free -h – keep free memory > 20 %
Disk space: df -h – keep usage < 80 %
Critical services: systemctl status nginx mysql redis – ensure they are active
Weekly checks
Clean old logs: find /var/log -name "*.log" -mtime +30 -exec rm {} \; Detect zombie processes: ps aux | grep defunct Analyze MySQL slow query log
Update system packages: yum update --security or
apt-get upgradeMonthly deep‑dive
Defragment ext4 partitions: e4defrag /dev/sda1 TCP connection analysis: ss -s Review kernel parameters in /etc/sysctl.conf Mastering this “command + optimization + script” combo transforms you from a reactive fire‑fighter into a proactive system architect. Excellent operations are not about the absence of failures, but about detecting warning signs early and recovering swiftly when incidents occur.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
