Operations 11 min read

10 Essential Linux Commands to Diagnose Slow Servers and Crashes

When servers become sluggish, fail to start, or run out of disk space, blindly restarting only masks the problem; this guide compiles ten critical Linux commands with usage scenarios to help you quickly pinpoint CPU, memory, port, disk, swap, and network issues for effective troubleshooting.

Xiao Liu Lab
Xiao Liu Lab
Xiao Liu Lab
10 Essential Linux Commands to Diagnose Slow Servers and Crashes

Server slowdown, startup failures, and disk‑space exhaustion are common symptoms of underlying resource problems; instead of restarting blindly, systematic diagnosis using built‑in Linux tools can reveal the root cause within minutes.

1. Identify high‑CPU‑usage processes

Command: ps aux --sort=-%cpu | head -n 11 Custom version:

ps -eo pid,comm,%cpu --no-headers | sort -k3 -nr | head -11 | awk '{printf "PID: %6s | Process: %-20s | CPU: %6.1f%%
", $1, $2, $3}'

Usage: the command lists the top 11 processes by CPU usage. Pay attention to the %CPU and COMMAND columns; a process consistently above 80 % may need deeper analysis. Note that on multi‑core systems a single process can exceed 100 % (e.g., 400 % on a 4‑core box).

2. Find high‑memory‑usage processes

Command: ps aux --sort=-%mem | head -n 11 Custom version:

ps -eo pid,comm,rss --no-headers | sort -k3 -nr | head -5 | awk '{printf "PID: %6s | Process: %-20s | MEM: %7.2f GB
", $1, $2, $3/1024/1024}'

Usage: sorts processes by resident set size (RSS) to identify memory hogs. Combine with free -h to see overall memory status and avoid mistaking cached memory ( buff/cache) for real pressure.

3. Check which process occupies a specific port

Command: ss -tulnp | grep :<port> Example for port 80: ss -tulnp | grep :80 Usage: when a service fails with “Address already in use”, this command quickly shows the PID and program name. lsof -i :80 provides similar information but ss is faster for production use.

4. List all listening ports and their processes

Command: ss -tulnp Usage: displays every TCP/UDP listening socket with the owning process, useful for security audits or service verification.

Parameters: -t: TCP connections -u: UDP connections -l: listening only -n: numeric output (no DNS lookup) -p: show process info

5. Analyse Swap usage and the processes consuming it

Step 1 – View overall Swap: free -h If the used column under Swap is noticeably greater than zero, the system is swapping and performance may degrade.

Step 2 – Locate processes using Swap (requires root):

for file in /proc/*/status; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k2 -n -r | head

Typical scenario: a Java application with an oversized heap triggers frequent Full GC, causing massive swap activity and I/O spikes.

6. Check disk‑space usage

Command: df -h Usage: shows human‑readable usage for each mounted filesystem. Pay special attention to /, /var, and /home. Alert thresholds: ≥90 % requires immediate action; ≥95 % may cause write failures or crashes.

7. Locate large files or directories

Command (run inside the target directory): du -sh * | sort -hr Workflow:

Identify the cramped partition with df -h (e.g., /var).

Enter it: cd /var.

Run the command above to sort sub‑directories by size.

Drill down until the offending large file (log, cache, core dump, etc.) is found.

Extended tip – find all files >100 MB system‑wide (ignore permission errors):

find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | head -n 20

8. View system load averages

Command: uptime Sample output:

19:30:01 up 10 days, load average: 4.20, 3.80, 2.90

The three numbers represent the 1‑, 5‑, and 15‑minute load averages. On an N‑core CPU, a sustained load > N indicates a bottleneck. High load does not always mean high CPU; I/O‑wait (D‑state) processes can also inflate the metric.

Auxiliary commands:

Check core count:

nproc

9. Detect zombie processes

Command: ps aux | awk '$8 ~ /^[Zz]/' Explanation: zombies are terminated processes that haven’t been reaped by their parent (state Z). They consume a process table entry but no CPU or memory. A large number can prevent new processes from being created.

Remediation: usually restart the parent process; if the parent is init (PID 1), the kernel will clean them automatically.

10. Identify abnormal network connections (potential attacks)

Command:

ss -tn | tail -n +2 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head

Purpose: counts TCP connections per remote IP, helping spot brute‑force attempts, DDoS, or aggressive crawlers. Mitigation: block suspicious IPs with iptables / firewalld or cloud security groups, and review authentication logs (e.g., grep "Failed" /var/log/secure or /var/log/auth.log).

General Troubleshooting Workflow

Observe the symptom: service unavailable, slow response, write failure, etc.

Check overall load: uptime to gauge system pressure.

Drill down by resource:

CPU → ps aux --sort=-%cpu Memory/Swap → free -h + memory‑sorted ps output

Disk → df -h + du -sh Network → ss -tulnp + connection statistics

Pinpoint the offending process: use its PID for deeper analysis with logs, strace, lsof, etc.

Fix and verify: after remediation, continue monitoring the metrics to ensure they return to normal.

Key principle: avoid relying on blind restarts; only by understanding the root cause can true system stability be achieved.

MonitoringperformanceCLILinuxTroubleshootingSystem Administration
Xiao Liu Lab
Written by

Xiao Liu Lab

An operations lab passionate about server tinkering 🔬 Sharing automation scripts, high-availability architecture, alert optimization, and incident reviews. Using technology to reduce overtime and experience to avoid major pitfalls. Follow me for easier, more reliable operations!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.