Operations 14 min read

20 Essential Linux Commands Every Ops Engineer Must Master

This guide presents twenty indispensable Linux commands—covering system monitoring, performance analysis, process management, networking, disk handling, and system tuning—along with practical examples, tips, and common pitfalls, empowering operations engineers to quickly diagnose and resolve production issues in modern cloud‑native environments.

Ops Community
Ops Community
Ops Community
20 Essential Linux Commands Every Ops Engineer Must Master

Introduction: The command‑line pitfalls we’ve faced

When a production server suddenly spikes to 90% CPU and memory runs out, many newcomers panic and type ps -ef | grep java without knowing how to pinpoint the offending process.

Mastering Linux commands is a survival skill for operations engineers, akin to a doctor’s scalpel or a firefighter’s hose.

Why these commands matter

In cloud‑native, micro‑service environments, a mid‑size internet company may manage hundreds of servers. When a fault occurs, you have only minutes to locate and fix the problem; these commands give you a "x‑ray vision" into the system.

Statistics show that about 70% of production incidents can be quickly identified using basic commands, with the remaining 30% often starting with the same tools.

Core toolbox: 20 commands explained

System monitoring (the "vital signs" of ops)

1. top – the system’s health monitor

# Basic usage
 top

# Show processes of a specific user
 top -u nginx

# Monitor a specific PID
 top -p 1234

Practical tip: Load average is often more important than CPU or memory usage. On a single‑core server, a load >1.0 is a warning; on multi‑core, watch for >70% of core count.

2. htop – an upgraded top

# Colorful UI with mouse support
 htop

# Show only your own processes
 htop -u $(whoami)

Pitfall: CentOS does not include htop by default; install the EPEL repository first.

3. iotop – a window into disk I/O

# Real‑time disk I/O view
 iotop

# Show only processes doing I/O
 iotop -o

Real case: A slow database turned out to be a noisy log process saturating disk I/O.

Performance analysis (deep kernel probing)

4. vmstat – virtual memory statistics

# Output every 2 seconds, 10 times
 vmstat 2 10

# Detailed memory info
 vmstat -s

5. iostat – I/O statistics wizard

# Show I/O every second
 iostat 1

# Extended device stats
 iostat -x 1

Best practice: Combine vmstat and iostat to quickly pinpoint whether the bottleneck is CPU, memory, or disk – the "three‑blade performance diagnosis".

6. sar – system activity reporter

# CPU usage history
 sar -u

# Memory usage history
 sar -r

# Network statistics
 sar -n DEV

Process management (the "life‑and‑death" of processes)

7. ps – snapshot of processes

# All processes
 ps aux

# Process tree
 ps -ef --forest

# Find a specific process
 ps aux | grep nginx

8. pstree – process family tree

# Show process tree
 pstree

# Show tree for a specific user
 pstree -u username

9. lsof – "everything open" file viewer

# Check which process uses port 80
 lsof -i:80

# See which processes use a file
 lsof /var/log/messages

# List files opened by a PID
 lsof -p 1234

Lesson: If a large file is deleted but space isn’t freed, use lsof to see which process still holds it.

Network diagnostics (the "stethoscope" for networking)

10. netstat – view network connections

# All connections
 netstat -tulnp

# TCP statistics
 netstat -st

11. ss – modern replacement for netstat

# All TCP connections
 ss -tulnp

# Specific port
 ss -tlnp | grep :80

Trend: Newer Linux versions favor ss for speed and richer features.

12. tcpdump – packet sniffer

# Capture packets on port 80
 tcpdump -i any port 80

# Capture packets from a host
 tcpdump host 192.168.1.100

# Save capture to file
 tcpdump -w capture.pcap

Disk management (the "steward" of storage)

13. df – view disk usage

# Human‑readable output
 df -h

# Show inode usage
 df -i

14. du – directory size analyzer

# Size of current directory
 du -sh *

# Find top 10 largest directories
 du -h | sort -hr | head -10

Tip: Use du to quickly locate which directory is filling the disk.

15. find – file search "detective"

# Files larger than 100M
 find / -size +100M -type f

# Log files older than 7 days
 find /var/log -name "*.log" -mtime +7

# Delete empty files
 find /tmp -empty -type f -delete

System information (the "ID card" of the OS)

16. uname – quick system info

# All system info
 uname -a

# Kernel version only
 uname -r

17. uptime – system run time and load

# Show uptime, users, load average
 uptime

# Pretty output
 uptime -p

18. free – memory usage

# Human‑readable memory info
 free -h

# Update every second
 free -s 1

System tuning (the "catalyst" for performance)

19. sysctl – kernel parameter tuner

# List all kernel parameters
 sysctl -a

# Change a parameter
 sysctl -w net.ipv4.ip_forward=1

# Load from config file
 sysctl -p

20. crontab – scheduled task manager

# List current user’s cron jobs
 crontab -l

# Edit cron jobs
 crontab -e

# View cron execution log
 tail -f /var/log/cron

Practical experience: avoid common traps

Pitfall 1 – overusing kill -9

Instead of immediately killing a stuck process, try kill -15 (SIGTERM) first to allow graceful shutdown.

Pitfall 2 – ignoring system logs

Commands tell you "what" happened; logs reveal "why". Use journalctl -f or tail -f /var/log/messages to follow logs.

Pitfall 3 – operating without backups

Always back up before any delete or modify operation; many disasters stem from an unchecked rm.

Advanced tricks: make commands more powerful

Combination examples:

# Find top CPU‑hungry processes
 ps aux | sort -k3 -nr | head -5

# Real‑time network connection count
 watch -n 1 "netstat -an | wc -l"

# Batch kill specific processes
 ps aux | grep nginx | awk '{print $2}' | xargs kill

Future outlook: the command line’s new mission in the cloud‑native era

With Kubernetes, Docker, and containers, traditional Linux commands are merging with modern tools: kubectl + ps: troubleshoot processes inside Pods docker stats + top: monitor container resource usage prometheus + sar: combine monitoring system with historical analysis

Regardless of new technologies, these foundational commands remain the "inner kung fu" of ops engineers.

Summary and call to action

These 20 commands are the "twenty‑four arts" of a Linux ops engineer. To get the most out of them:

Practice daily : pick 3‑5 commands and dive deep.

Build a cheat sheet : record useful parameter combinations.

Apply in real scenarios : experiment boldly in non‑critical environments.

Which commands do you use most? Share your own pitfalls and tips in the comments to help the community grow.

operationsDevOpsPerformance TuningLinuxCommand LineSystem monitoring
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.