Operations 13 min read

Master Linux Ops: From Essential Commands to Deep System Optimization

This comprehensive guide walks Linux administrators through advanced command usage, performance metric tuning, real‑world troubleshooting cases, and a practical checklist, empowering readers to move from reactive firefighting to proactive system optimization and reliability.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Linux Ops: From Essential Commands to Deep System Optimization

Linux Operations Advanced: From Basic Commands to System Optimization

If you’ve ever been woken up at 2 a.m. by a timeout alert, struggled to remember countless Linux commands, or felt lost when trying to tune a server, this article is for you.

What you will gain:

20 high‑frequency ops commands with advanced usage (e.g., top -Hp, strace -c)

Five key performance‑metric optimization strategies covering CPU, memory, disk I/O, network, and process management

Three real‑world production cases with full troubleshooting steps and reusable scripts

A ready‑to‑use system‑optimization checklist for daily, weekly, and monthly maintenance

1. Advanced Basic Commands: From “Knowing” to “Mastering”

1.1 CPU investigation trio: top , htop , pidstat

Most people use top only to view overall CPU usage; experts drill down to thread‑level details.

Key command 1: Locate high‑CPU threads

# Find the PID of the high‑CPU process (e.g., 12345)
top -c

# Show thread‑level CPU usage for that PID
 top -Hp 12345

# Convert thread ID to hex for jstack matching
printf "%x
" 12356

Lesson: Looking only at process‑level CPU can miss a Java GC thread that hogs CPU, leading to hours of wasted investigation. Always use the -Hp flag.

Key command 2: Real‑time resource monitoring for a specific process

# Refresh every 2 seconds for PID 12345
pidstat -u -r -d -t -p 12345 2

1.2 Memory investigation beyond free -h

Key command 3: Pinpoint memory‑leaking processes

# Show memory map and highlight abnormal growth
pmap -x 12345 | tail -5

# Continuously log memory growth every 5 seconds
while true; do
  date >> mem_monitor.log
  ps aux | grep 12345 | grep -v grep >> mem_monitor.log
  sleep 5
done

Lesson: free -h only shows total memory; using smem -rs swap -p can quickly reveal the offending process.

1.3 Disk I/O deep dive with iotop

Key command 4: Find hidden I/O killers

# Show only processes with I/O activity, refresh every 2 seconds
iotop -oP -d 2

# Inspect a specific process’s I/O details
cat /proc/12345/io

Practical tip: A MySQL slowdown was traced to log‑file writes saturating disk I/O; changing innodb_flush_log_at_trx_commit from 1 to 2 boosted write performance five‑fold.

2. System Optimization in Practice: From Theory to Implementation

2.1 CPU optimization – beyond nice values

Solution 1: CPU affinity binding

# Bind nginx workers to specific cores
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;

# Verify binding
taskset -cp $(pgrep nginx)

Binding reduces cache misses and can improve performance by ~15%.

Solution 2: Interrupt load balancing

# View current interrupt distribution
cat /proc/interrupts

# Bind NIC interrupt to CPU 2
echo 2 > /proc/irq/24/smp_affinity

2.2 Memory optimization – scientific tuning

Solution 3: Adjust swappiness

# Check current value
cat /proc/sys/vm/swappiness

# Temporary change (lost on reboot)
echo 10 > /proc/sys/vm/swappiness

# Permanent change
echo "vm.swappiness = 10" >> /etc/sysctl.conf
sysctl -p

Do not set swappiness to 0; it can cause OOM kills under memory pressure.

2.3 Network optimization – boosting concurrent connections

Solution 4: TCP parameter tuning

# Increase TCP listen queue
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf

# Raise SYN backlog
echo 'net.ipv4.tcp_max_syn_backlog = 8192' >> /etc/sysctl.conf

# Reuse TIME_WAIT sockets (safe in production)
echo 'net.ipv4.tcp_tw_reuse = 1' >> /etc/sysctl.conf

# Do NOT enable tcp_tw_recycle in NAT environments

# Apply immediately
sysctl -p

3. Real‑World Cases: Full Fault‑Diagnosis Walkthroughs

Case 1: CPU shows 100 % but no offending process

Symptoms: Monitoring reports 100 % CPU, top shows no high‑CPU process.

# Step 1: Look for hidden processes
ps aux | awk '{print $3}' | sort -rn | head -10

# Step 2: Check kernel threads
ps aux | grep "\[.*\]"

# Step 3: Examine I/O wait
top   # notice high wa value

# Step 4: Identify I/O bottleneck
iotop -oP   # rsync backup process is the culprit

Root cause: An rsync backup generated massive I/O wait, making CPU appear saturated.

Solution:

Switch rsync to incremental mode: rsync -avz --delete Limit bandwidth: --bwlimit=10240 (10 MB/s)

Schedule backups during off‑peak hours

Case 2: Memory leak causing cascade failures

Full detection script (run as root):

#!/bin/bash
# check_memory_leak.sh – detect processes with significant memory growth

echo "=== Memory Leak Detection Script ==="
echo "Monitoring memory usage for 60 seconds..."

tmpfile=$(mktemp)

# First sample
ps aux --sort=-%mem | head -20 | awk '{print $2,$4,$11}' > $tmpfile.1

sleep 60

# Second sample
ps aux --sort=-%mem | head -20 | awk '{print $2,$4,$11}' > $tmpfile.2

echo -e "
=== Processes with significant memory growth ==="
echo "PID   MEM_BEFORE  MEM_AFTER  GROWTH  COMMAND"

while read pid mem1 cmd1; do
  mem2=$(grep "^$pid " $tmpfile.2 | awk '{print $2}')
  if [ -n "$mem2" ]; then
    growth=$(echo "$mem2 - $mem1" | bc)
    if (( $(echo "$growth > 0.5" | bc -l) )); then
      printf "%-6s %-10s %-10s %-7s %s
" "$pid" "$mem1%" "$mem2%" "+$growth%" "$cmd1"
    fi
  fi
done < $tmpfile.1

rm -f $tmpfile*

echo -e "
=== Recommendation ==="
echo "For suspicious processes, use: pmap -x PID"
echo "Or check memory maps: cat /proc/PID/smaps"

Usage:

chmod +x check_memory_leak.sh
./check_memory_leak.sh

4. System‑Optimization Checklist (daily/weekly/monthly)

Daily checks

CPU load: uptime – keep 1/5/15‑minute load below 0.7 × CPU count

Memory usage: free -h – keep free memory > 20 %

Disk space: df -h – keep usage < 80 %

Critical services: systemctl status nginx mysql redis – ensure they are active

Weekly checks

Clean old logs: find /var/log -name "*.log" -mtime +30 -exec rm {} \; Detect zombie processes: ps aux | grep defunct Analyze MySQL slow query log

Update system packages: yum update --security or

apt-get upgrade

Monthly deep‑dive

Defragment ext4 partitions: e4defrag /dev/sda1 TCP connection analysis: ss -s Review kernel parameters in /etc/sysctl.conf Mastering this “command + optimization + script” combo transforms you from a reactive fire‑fighter into a proactive system architect. Excellent operations are not about the absence of failures, but about detecting warning signs early and recovering swiftly when incidents occur.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System optimizationCommand-Line
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.