Operations 12 min read

20 Essential Linux Commands Every Ops Engineer Must Master

This article presents twenty indispensable Linux command‑line tools—covering system monitoring, performance analysis, process management, network diagnostics, disk handling, and kernel tuning—explaining their syntax, practical tips, common pitfalls, and how they integrate with modern cloud‑native environments.

Liangxu Linux
Liangxu Linux
Liangxu Linux
20 Essential Linux Commands Every Ops Engineer Must Master

Introduction

When a production server spikes to 90% CPU and memory alerts sound, the ability to quickly pinpoint the offending process with commands like ps -ef | grep java becomes a survival skill for any operations engineer.

Why These Commands Matter

In cloud‑native, micro‑service architectures, a medium‑size internet company may run hundreds of servers, leaving only a few minutes to diagnose and resolve incidents. Mastery of core Linux commands provides the "X‑ray vision" needed to inspect every corner of the system.

Statistics show that about 70% of production failures can be located using basic command‑line utilities, with the remaining 30% often starting from the same tools.

Core Toolbox: 20 Commands Explained

System Monitoring ("Health Check Devices")

1. top – System "health monitor"

# Basic usage
top

# Show only processes of a specific user
top -u nginx

# Monitor a specific PID
top -p 1234

Tip: Pay attention to load average —on a single‑core server a load >1.0 is a warning; on multi‑core, >70% of core count deserves attention.

2. htop – Enhanced top

# Colorful UI with mouse support
htop

# Show only your own processes
htop -u $(whoami)

Note: CentOS does not include htop by default; install the EPEL repository first.

3. iotop – Disk I/O "microscope"

# Real‑time I/O view
iotop

# Show only processes doing I/O
iotop -o

Case: A slow database query was traced to a log‑writer saturating the disk, discovered via iotop.

Performance Analysis ("Kernel Probes")

4. vmstat – Virtual memory statistics

# Output every 2 seconds, 10 times
vmstat 2 10

# Detailed memory info
vmstat -s

5. iostat – I/O statistics

# Per‑second I/O stats
iostat 1

# Extended device statistics
iostat -x 1

Best practice: Combine vmstat and iostat to quickly identify whether the bottleneck lies in CPU, memory, or disk.

6. sar – System activity reporter

# CPU usage history
sar -u

# Memory usage history
sar -r

# Network statistics
sar -n DEV

Process Management ("Process Authority")

7. ps – Process snapshot

# List all processes
ps aux

# Show process tree
ps -ef --forest

# Find a specific process
ps aux | grep nginx

8. pstree – Process hierarchy

# Display process tree
pstree

# Show tree for a specific user
pstree -u username

9. lsof – "Everything open" inspector

# Check which process uses a port
lsof -i:80

# Find which process holds a file
lsof /var/log/messages

# List files opened by a PID
lsof -p 1234

Lesson: After deleting a large file, use lsof to ensure no process still holds it, otherwise disk space won’t be reclaimed.

Network Diagnostics ("Network Stethoscope")

10. netstat – Network connections

# All connections
netstat -tulnp

# TCP statistics
netstat -st

11. ss – Modern replacement for netstat

# All TCP connections
ss -tulnp

# Specific port
ss -tlnp | grep :80

Trend: Newer Linux distributions recommend ss for faster, richer output.

12. tcpdump – Packet sniffer

# Capture packets on port 80
tcpdump -i any port 80

# Capture packets from a host
tcpdump host 192.168.1.100

# Save capture to file
tcpdump -w capture.pcap

Disk Management ("Storage Steward")

13. df – Disk usage overview

# Human‑readable output
df -h

# Show inode usage
df -i

14. du – Directory size

# Size of current directory
du -sh *

# Top 10 largest directories
du -h | sort -hr | head -10

15. find – File finder

# Files larger than 100 M
find / -size +100M -type f

# Logs older than 7 days
find /var/log -name "*.log" -mtime +7

# Delete empty files
find /tmp -empty -type f -delete

System Information ("System ID Card")

16. uname – System info

# All system info
uname -a

# Kernel version only
uname -r

17. uptime – Load and uptime

# Show uptime and load averages
uptime

# Pretty output
uptime -p

18. free – Memory usage

# Human‑readable memory stats
free -h

# Update every second
free -s 1

System Tuning ("Performance Catalyst")

19. sysctl – Kernel parameter tuning

# List all parameters
sysctl -a

# Modify a parameter
sysctl -w net.ipv4.ip_forward=1

# Load from config file
sysctl -p

20. crontab – Scheduled tasks

# List current user's jobs
crontab -l

# Edit jobs
crontab -e

# View cron logs
tail -f /var/log/cron

Common Pitfalls

Pitfall 1: Overusing kill -9

Instead of immediately sending SIGKILL, try kill -15 (SIGTERM) first to allow graceful shutdown.

Pitfall 2: Ignoring System Logs

# Follow system journal
journalctl -f

# Traditional log tail
tail -f /var/log/messages

Logs explain the "why" behind the symptoms shown by commands.

Pitfall 3: Operating Without Backups

Always back up data before running destructive commands like rm to avoid irreversible loss.

Advanced Tips: Combining Commands

# Top 5 CPU‑hungry processes
ps aux | sort -k3 -nr | head -5

# Real‑time network connection count
watch -n 1 "netstat -an | wc -l"

# Batch kill nginx processes
ps aux | grep nginx | awk '{print $2}' | xargs kill

Future Outlook: Commands in the Cloud‑Native Era

kubectl

+ ps: Diagnose processes inside Pods. docker stats + top: Monitor container resource usage. prometheus + sar: Historical performance analysis.

Regardless of new tools, these foundational commands remain the "inner kung fu" of operations engineers.

Conclusion & Call to Action

These twenty commands form the "twenty‑four arts" of a sysadmin. Practice daily, document useful parameter combos, and experiment in non‑critical environments to turn knowledge into muscle memory.

What are your go‑to commands? Share your experiences in the comments!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsprocess managementPerformance MonitoringLinuxcommand-lineSysadminNetwork Diagnostics
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.