20 Essential Linux Commands Every Ops Engineer Must Master
This guide presents twenty indispensable Linux commands—covering system monitoring, performance analysis, process management, networking, disk handling, and system tuning—along with practical examples, tips, and common pitfalls, empowering operations engineers to quickly diagnose and resolve production issues in modern cloud‑native environments.
Introduction: The command‑line pitfalls we’ve faced
When a production server suddenly spikes to 90% CPU and memory runs out, many newcomers panic and type ps -ef | grep java without knowing how to pinpoint the offending process.
Mastering Linux commands is a survival skill for operations engineers, akin to a doctor’s scalpel or a firefighter’s hose.
Why these commands matter
In cloud‑native, micro‑service environments, a mid‑size internet company may manage hundreds of servers. When a fault occurs, you have only minutes to locate and fix the problem; these commands give you a "x‑ray vision" into the system.
Statistics show that about 70% of production incidents can be quickly identified using basic commands, with the remaining 30% often starting with the same tools.
Core toolbox: 20 commands explained
System monitoring (the "vital signs" of ops)
1. top – the system’s health monitor
# Basic usage
top
# Show processes of a specific user
top -u nginx
# Monitor a specific PID
top -p 1234Practical tip: Load average is often more important than CPU or memory usage. On a single‑core server, a load >1.0 is a warning; on multi‑core, watch for >70% of core count.
2. htop – an upgraded top
# Colorful UI with mouse support
htop
# Show only your own processes
htop -u $(whoami)Pitfall: CentOS does not include htop by default; install the EPEL repository first.
3. iotop – a window into disk I/O
# Real‑time disk I/O view
iotop
# Show only processes doing I/O
iotop -oReal case: A slow database turned out to be a noisy log process saturating disk I/O.
Performance analysis (deep kernel probing)
4. vmstat – virtual memory statistics
# Output every 2 seconds, 10 times
vmstat 2 10
# Detailed memory info
vmstat -s5. iostat – I/O statistics wizard
# Show I/O every second
iostat 1
# Extended device stats
iostat -x 1Best practice: Combine vmstat and iostat to quickly pinpoint whether the bottleneck is CPU, memory, or disk – the "three‑blade performance diagnosis".
6. sar – system activity reporter
# CPU usage history
sar -u
# Memory usage history
sar -r
# Network statistics
sar -n DEVProcess management (the "life‑and‑death" of processes)
7. ps – snapshot of processes
# All processes
ps aux
# Process tree
ps -ef --forest
# Find a specific process
ps aux | grep nginx8. pstree – process family tree
# Show process tree
pstree
# Show tree for a specific user
pstree -u username9. lsof – "everything open" file viewer
# Check which process uses port 80
lsof -i:80
# See which processes use a file
lsof /var/log/messages
# List files opened by a PID
lsof -p 1234Lesson: If a large file is deleted but space isn’t freed, use lsof to see which process still holds it.
Network diagnostics (the "stethoscope" for networking)
10. netstat – view network connections
# All connections
netstat -tulnp
# TCP statistics
netstat -st11. ss – modern replacement for netstat
# All TCP connections
ss -tulnp
# Specific port
ss -tlnp | grep :80Trend: Newer Linux versions favor ss for speed and richer features.
12. tcpdump – packet sniffer
# Capture packets on port 80
tcpdump -i any port 80
# Capture packets from a host
tcpdump host 192.168.1.100
# Save capture to file
tcpdump -w capture.pcapDisk management (the "steward" of storage)
13. df – view disk usage
# Human‑readable output
df -h
# Show inode usage
df -i14. du – directory size analyzer
# Size of current directory
du -sh *
# Find top 10 largest directories
du -h | sort -hr | head -10Tip: Use du to quickly locate which directory is filling the disk.
15. find – file search "detective"
# Files larger than 100M
find / -size +100M -type f
# Log files older than 7 days
find /var/log -name "*.log" -mtime +7
# Delete empty files
find /tmp -empty -type f -deleteSystem information (the "ID card" of the OS)
16. uname – quick system info
# All system info
uname -a
# Kernel version only
uname -r17. uptime – system run time and load
# Show uptime, users, load average
uptime
# Pretty output
uptime -p18. free – memory usage
# Human‑readable memory info
free -h
# Update every second
free -s 1System tuning (the "catalyst" for performance)
19. sysctl – kernel parameter tuner
# List all kernel parameters
sysctl -a
# Change a parameter
sysctl -w net.ipv4.ip_forward=1
# Load from config file
sysctl -p20. crontab – scheduled task manager
# List current user’s cron jobs
crontab -l
# Edit cron jobs
crontab -e
# View cron execution log
tail -f /var/log/cronPractical experience: avoid common traps
Pitfall 1 – overusing kill -9
Instead of immediately killing a stuck process, try kill -15 (SIGTERM) first to allow graceful shutdown.
Pitfall 2 – ignoring system logs
Commands tell you "what" happened; logs reveal "why". Use journalctl -f or tail -f /var/log/messages to follow logs.
Pitfall 3 – operating without backups
Always back up before any delete or modify operation; many disasters stem from an unchecked rm.
Advanced tricks: make commands more powerful
Combination examples:
# Find top CPU‑hungry processes
ps aux | sort -k3 -nr | head -5
# Real‑time network connection count
watch -n 1 "netstat -an | wc -l"
# Batch kill specific processes
ps aux | grep nginx | awk '{print $2}' | xargs killFuture outlook: the command line’s new mission in the cloud‑native era
With Kubernetes, Docker, and containers, traditional Linux commands are merging with modern tools: kubectl + ps: troubleshoot processes inside Pods docker stats + top: monitor container resource usage prometheus + sar: combine monitoring system with historical analysis
Regardless of new technologies, these foundational commands remain the "inner kung fu" of ops engineers.
Summary and call to action
These 20 commands are the "twenty‑four arts" of a Linux ops engineer. To get the most out of them:
Practice daily : pick 3‑5 commands and dive deep.
Build a cheat sheet : record useful parameter combinations.
Apply in real scenarios : experiment boldly in non‑critical environments.
Which commands do you use most? Share your own pitfalls and tips in the comments to help the community grow.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
