Top 10 Linux Ops Troubleshooting Tips Every Sysadmin Should Know
This article compiles ten common Linux operational problems—from shell script failures and cron output issues to disk space leaks and MySQL storage errors—detailing their causes and step‑by‑step solutions to help engineers quickly diagnose and resolve system faults.
As a Linux operations engineer, encountering various problems and failures is inevitable; summarizing experiences, investigating root causes, and documenting solutions is a good habit that turns practice into valuable knowledge.
The following list gathers ten typical issues you may meet during projects, along with their causes and fixes.
Common Linux Issues and Solutions
1. Shell script does not execute
Problem: A colleague reports a simple shell script fails with "bad interpreter: No such file or directory".
Cause: The script was edited on Windows, introducing CRLF line endings (\r\n) which appear as ^M in Linux.
Solution:
Rewrite the script directly on Linux.
Remove Windows line endings with vi:%s/\r//g and :%s/^M//g (type ^M with Ctrl+V, Ctrl+M).
Use sh -x script.sh for step‑by‑step execution and debugging.
2. Crontab output fills /var/spool/clientmqueue
Problem: The /var/spool/clientmqueue directory exceeds 100 GB.
Cause: Cron jobs produce output that is mailed to the cron user; because sendmail is not running, the messages accumulate as files.
Solution:
Manually delete the files: ls | xargs rm -f Suppress output in cron entries by appending >/dev/null 2>&1 to the command.
3. Telnet/SSH is slow
Problem: Telnet to a remote host is sluggish, while ping works and DNS lookup fails.
Cause: Reverse DNS lookup on the client’s IP is timing out.
Solution:
Add the correct hostname ‑ IP mapping to /etc/hosts.
Comment out the non‑functional nameserver in /etc/resolv.conf or use a reliable one.
4. Read‑only file system error (MySQL)
Problem: MySQL fails to create a table, reporting "ERROR 1005 (HY000): Can't create table … (errno: 30)" which indicates a read‑only file system.
Possible causes:
File system corruption.
Bad disk sectors.
Incorrect /etc/fstab entries (e.g., wrong file‑system type).
Solution: Reboot the test machine or remount the file system; in some cases mount -o remount,rw /dev/… resolves the issue.
5. Deleted file does not free disk space
Problem: df -h shows 90 GB used, but du -sh * accounts for only 30 GB.
Cause: A process still holds an open file descriptor to a deleted file.
Solution:
Identify the offending process: /usr/sbin/lsof | grep deleted Terminate the process or close the descriptor, e.g., echo > /proc/25575/fd/33.
Alternatively, truncate the file: cat /dev/null > file.
6. Improve performance of find cleanup script
Problem: A nightly find command that deletes old picture_* files causes high load.
Cause: Scanning a directory with many entries is resource‑intensive.
Solution: Use a more efficient shell pipeline:
#!/bin/sh
cd /tmp
time=$(date -d "2 days ago" "+%b%d")
ls -l | grep "picture" | grep "$time" | awk '{print $NF}' | xargs rm -rf7. Unable to obtain gateway MAC address
Problem: ARP fails to retrieve the MAC address of the gateway.
Solution:
Bind a static ARP entry:
arp -s 192.168.3.254 00:00:5e:00:01:648. HTTP service fails to start (port 7080)
Problem: Starting httpd reports "Address already in use" for port 7080.
Cause:
Port appears occupied; netstat -npl | grep 7080 shows nothing.
The same port is defined in multiple configuration files.
Solution: Comment out the duplicate Listen 7080 line in /etc/httpd/conf.d/t.10086.cn.conf and restart the service.
9. "Too many open files" error
Problem: System reports "too many open files".
Solution: Increase file descriptor limits:
echo "" >> /etc/security/limits.conf
echo "* soft nproc 65535" >> /etc/security/limits.conf
echo "* hard nproc 65535" >> /etc/security/limits.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf
echo "* hard nofile 65535" >> /etc/security/limits.conf
echo "" >> /root/.bash_profile
echo "ulimit -n 65535" >> /root/.bash_profile
echo "ulimit -u 65535" >> /root/.bash_profileThen reboot or run ulimit -u 65535 && ulimit -n 65535.
10. ibdata1 and mysql‑bin logs consume disk space
Problem: Disk usage alarm; ibdata1 >120 GB and mysql‑bin >80 GB.
Cause: ibdata1 stores InnoDB tablespace and indexes in a shared file.
Binary logs accumulate over time.
Solution:
For oversized ibdata1, dump databases, delete the file, and recreate the tablespace.
To prune binary logs:
mysql> PURGE MASTER LOGS TO 'mysql-bin.010';
mysql> PURGE MASTER LOGS BEFORE '2010-12-22 13:00:00';Or set expire_logs_days=30 in /etc/my.cnf for automatic cleanup.
Fault‑troubleshooting summary table
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
