Operations 18 min read

Essential Ops Troubleshooting: 10 Quick Fixes and 22 Common Failure Cases

This guide compiles the most frequent Linux and network problems faced by operations engineers—ranging from non‑executing shell scripts and cron output issues to read‑only filesystems, disk space leaks, and service start failures—providing clear causes and step‑by‑step solutions for each case.

Efficient Ops
Efficient Ops
Efficient Ops
Essential Ops Troubleshooting: 10 Quick Fixes and 22 Common Failure Cases

As an operations engineer, you often encounter various issues; summarizing common faults and solutions helps develop good habits.

10 Problem‑Solving Techniques

1. Shell script does not execute

Problem: Script reports ":bad interpreter: No such file or directory".

Cause: The script was edited on Windows, leaving CRLF line endings (\r) that appear as ^M on Linux.

Solution:

Rewrite the script directly on Linux, or

Run

vi:%s/\r//g

and

vi:%s/^M//g

(enter ^M with Ctrl+V, Ctrl+M) to remove the stray characters.

Tip: Use sh -x script.sh to execute step‑by‑step and see where it fails.

2. Controlling crontab output

Problem:

/var/spool/clientmqueue

grows beyond 100 GB.

Cause: Cron jobs produce output that is mailed to the cron user; sendmail is not running, so the mail files accumulate.

Solution:

Manually delete the files:

ls | xargs rm -f

, or

Append

>/dev/null 2>&1

to cron commands to discard output.

3. Telnet/SSH is slow

Problem: Telnet from host 10.50 to 10.52 is very slow, while ping works.

Cause: Reverse DNS lookup fails because the nameserver is not reachable.

Solution:

Add the correct hostname‑IP mapping to

/etc/hosts

, and

Comment out the non‑working nameserver in

/etc/resolv.conf

or use a functional one.

4. Read‑only filesystem error

Problem: MySQL fails to create a table, reporting "ERROR 1005 (HY000): Can't create table … (errno: 30)".

Cause: Underlying OS reports error code 30 – read‑only filesystem, possibly due to filesystem corruption, bad disk sectors, or incorrect

fstab

entries.

Solution:

Reboot the test machine to recover, or

Remount the filesystem with write permissions (e.g.,

mount -o remount,rw /dev/sdX

).

5. Deleted file does not free disk space

Problem:

df -h

shows 90 GB used, but

du -sh /*

totals only 30 GB.

Cause: A process still holds an open file descriptor to a deleted file.

Solution:

Restart the system or the affected service, or

Identify the holding process with

/usr/sbin/lsof | grep deleted

and release the space by closing the file descriptor, e.g.,

echo > /proc/25575/fd/33

, or kill the process.

6. Improving performance of find cleanup

Problem: A nightly

find /tmp -name "picture_*" -mtime +1 -exec rm -f {}

script causes high load.

Cause: Scanning a directory with many files is resource‑intensive.

Solution: Change to the directory first and use faster shell constructs, e.g.:

<code>#!/bin/sh
cd /tmp
time=$(date -d "2 days ago" "+%b%d")
ls -l | grep "picture" | awk '{print $NF}' | xargs rm -rf</code>

7. Unable to obtain gateway MAC address

Problem: ARP table shows incomplete entry for the gateway.

Solution: Bind the correct MAC address manually, e.g.,

arp -s 192.168.3.254 00:5e:00:01:64

.

8. HTTP service fails to start

Problem: Starting

httpd

reports address already in use on port 7080.

Cause: Port 7080 is defined in multiple configuration files (

/etc/httpd/conf/http.conf

and

/etc/httpd/conf.d/t.10086.cn.conf

).

Solution: Comment out the duplicate

Listen 7080

line in the second file and restart the service.

9. "Too many open files" error

Problem: Applications hit the "too many open files" limit.

Solution: Increase limits in

/etc/security/limits.conf

and

/root/.bash_profile

:

<code>* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
ulimit -n 65535
ulimit -u 65535</code>

Reboot or run the

ulimit

commands.

10. ibdata1 and mysql‑bin logs consume disk space

Problem: ibdata1 >120 GB and mysql‑bin >80 GB fill the disk.

Cause: InnoDB shared tablespace grows without automatic shrinkage; binary logs accumulate.

Solution:

Dump and recreate the database to shrink ibdata1.

Manually purge old binary logs:

PURGE MASTER LOGS TO 'mysql-bin.010';

or

PURGE MASTER LOGS BEFORE '2010-12-22 13:00:00';

Set

expire_logs_days=30

in

/etc/my.cnf

for automatic cleanup.

22 Common Failure Cases

1. Linux installer cannot find hard disk. Fix: Enter the BIOS/COMS settings and set the disk mode to compatible.

2. Installation stops after partitioning. Fix: Ensure both root and swap partitions are created.

3. Missing or unwanted packages after installation. Fix: Gain deeper Linux knowledge and reinstall as needed.

4. Proxy server filter rules not taking effect. Fix: Verify module loading, correct default policies, correct iptables syntax, and rule order.

5. After proxy/firewall setup, Internet works but DMZ services do not. Fix: Disable iptables temporarily to test; adjust rules if needed.

6. iptables rules disappear after service restart. Fix: Set

IPTABLES_SAVE_ON_RESTART="yes"

in

/etc/sysconfig/iptables-config

and save rules with

iptables-save &gt; /etc/sysconfig/iptables

.

7. VLAN cannot access external network. Fix: Configure the correct gateway for the VLAN.

8. named service fails to start. Fix: Ensure required files exist in

/etc/named

and

/var/named

, and that the named user has proper permissions.

9. DNS resolution fails. Fix: Check forward/reverse zone files,

/etc/named.conf

syntax, bind‑chroot locations, and

/etc/resolv.conf

nameserver entries.

10. dhcpd reports "No subnet declaration for eth0". Fix: Assign an IP to eth0 that falls within a defined DHCP subnet.

11. Multiple DHCP scopes but only one distributes addresses. Fix: Provide a separate network interface for each scope (eth0, eth1, eth2) or use a super‑scope.

12. MySQL installation fails due to dependency issues. Fix: Install required libraries and follow the dependency chain in the correct order.

13. Web service returns no page despite connection. Fix: Correct the

DocumentRoot

path in

httpd.conf

(remove trailing slash).

14. Remote client cannot access Samba share. Fix: Disable iptables.

15. Samba returns "NT_STATUS_BAD_NETWORK_NAME". Fix: Ensure the shared directory exists.

16. Samba returns "NT_STATUS_ACCESS_DENIED". Fix: Verify username/password and disable firewall if needed.

17. Samba returns "NT_STATUS_LOGON_FAILURE". Fix: Grant the user access to the share.

18. FTP upload rejected. Fix: Grant write permission on the target directory for the FTP user.

19. root cannot log into FTP ("500 OOPS: cannot change directory:/root"). Fix: Disable SELinux or set

SELINUX=disabled

in

/etc/selinux/config

.

20. Mail client can send but not receive mail. Fix: Ensure the POP3 service is running.

21. NFS mount hangs. Fix: Start the

portmap

service.

22. NFS mount works locally but not from other clients. Fix: Disable iptables on the server.

operationsnetworkLinuxTroubleshootingshellsysadmincron
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.