Essential Linux Server Troubleshooting Checklist: 13 Practical Steps
When a Linux server experiences a failure, this guide walks you through a comprehensive 13‑step checklist—covering problem context, user activity, process inspection, network services, resource usage, hardware, I/O performance, logs, and scheduled tasks—to help you quickly pinpoint and resolve the root cause.
When a server fault occurs, the cause is rarely obvious; start with this systematic 13‑step checklist.
1. Clarify the problem context
Identify what the failure looks like (no response, error messages), when it was first noticed, whether it can be reproduced, any patterns (e.g., hourly), recent platform changes, affected user groups, available infrastructure documentation, monitoring tools (Munin, Zabbix, Nagios, New Relic), and log sources (Loggly, Airbrake, Graylog).
2. Who is logged in? $ w$ last Check which users are currently online and who has accessed the system, avoiding debugging while others are working.
3. What happened previously? $ history Review recent commands executed on the server; consider setting HISTTIMEFORMAT to see timestamps.
4. What processes are running now?
$ pstree -a
$ ps auxUse ps aux for detailed output and pstree -a for a clearer view of processes and their owners.
5. Which network services are listening?
$ netstat -ntlp
$ netstat -nulp
$ netstat -nxlpRun the commands separately to avoid an overwhelming list; verify that each listening port corresponds to an expected service and PID.
6. CPU and memory status
$ free -m
$ uptime
$ top
$ htopCheck for free memory, swap activity, CPU core load, and overall system load averages.
7. Hardware inspection
$ lspci
$ dmidecode
$ ethtoolIdentify RAID cards, CPU details, free memory slots, NIC configuration, duplex mode, speed, and any TX/RX errors.
8. I/O performance
$ iostat -kx 2
$ vmstat 2 10
$ mpstat 2 10
$ dstat --top-io --top-bioUse these tools to assess disk usage, swap activity, CPU consumption by system, user, or VM processes, and identify which process (e.g., MySQL, PHP) is driving I/O.
9. Mount points and filesystems
$ mount
$ cat /etc/fstab
$ vgs
$ pvs
$ lvs
$ df -h
$ lsof +D / # beware not to kill your boxCheck the number of mounted filesystems, dedicated service filesystems, mount options, remaining disk space, and whether large deleted files still occupy space.
10. Kernel, interrupts, and network tuning
$ sysctl -a | grep …
$ cat /proc/interrupts
$ cat /proc/net/ip_conntrack
$ netstat
$ ss -sVerify balanced interrupt distribution across CPUs, swap settings, conntrack limits, TCP timeout settings, and consider using ss for faster connection overviews.
11. System and kernel logs
$ dmesg
$ less /var/log/messages
$ less /var/log/secure
$ less /var/log/authLook for error or warning messages, hardware or filesystem issues, and correlate timestamps with earlier findings.
12. Scheduled tasks
$ ls /etc/cron* + cat
$ for user in $(cut -d: -f1 /etc/passwd); do crontab -l -u $user; doneIdentify overly frequent jobs, hidden user crontabs, or backup tasks running during the failure.
13. Application logs
Apache/Nginx: check access and error logs for 5xx errors or limit_zone issues.
MySQL: inspect mysql.log for corruption or InnoDB repair activity.
PHP‑FPM: enable and review slow‑log for PHP, MySQL, or memcache errors.
Varnish: use varnishlog and varnishstat to check hit/miss ratios.
HA‑Proxy: verify backend health checks and queue sizes.
Conclusion
After following these steps you should know what processes are running, whether the issue relates to I/O, hardware, network, or system configuration, and you will have enough information to dig deeper and ultimately locate the root cause.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
