How to Diagnose and Fix Common Linux System Failures
This guide walks through typical Linux operational problems—including boot failures, network issues, MBR and GRUB errors, forgotten root passwords, and read‑only file‑system symptoms—explaining their causes, step‑by‑step diagnostic methods, and practical recovery commands to restore a healthy system.
1. Linux System Boot Failure
Boot problems often stem from misconfigured system files, filesystem corruption, missing kernel files, or hardware faults. The most common cause is an incorrect /etc/fstab that prevents the system from mounting essential partitions.
Cause 1: Wrong or missing entries in /etc/fstab. Diagnosis: The system stops after "starting system logger". Solution: Restore the /etc/fstab file using a rescue environment and rebuild the file with correct mount points.
Cause 2: Filesystem inconsistency after sudden power loss, especially on ext3/ext4 with journaling. Diagnosis: Error messages such as "checking root filesystem" and "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY" appear during boot. Solution: Unmount the affected partition and run fsck to repair.
# umount /dev/sdb5
# fsck.ext3 -y /dev/sdb5Cause 3: Missing or corrupted kernel files in the /boot partition (e.g., vmlinuz or initrd.img). Solution: Boot from rescue media, mount /boot, copy the missing files from a backup or installation media, and update grub.cfg accordingly.
Cause 4: Hardware failures such as faulty motherboard, power supply, or disk. Solution: Replace the defective hardware component.
2. Linux Network Faults
Network issues are usually caused by hardware problems, misconfigured interfaces, or incorrect routing tables.
Step 1: Verify physical hardware (NIC, cable, switch, router). Replace any faulty component.
Step 2: Check that the NIC driver is loaded using ifconfig or ip addr. Use ethtool to inspect link speed and status.
Step 3: Ensure IP settings are correct and do not conflict with other hosts.
Step 4: Examine the routing table with route -n or ip route. Remove incorrect default routes and add the proper ones, e.g.:
# route delete default
# route add default gw 10.10.1.254Step 5: Test DNS resolution by checking /etc/host.conf and /etc/nsswitch.conf. The typical configuration is order hosts,bind and hosts: files dns.
Step 6: Verify that required services (e.g., SSH on port 22) are listening using telnet or netstat -tlnp.
3. MBR Sector Failure
Symptoms include "Operating system not found" or a black screen after power‑on. Causes are virus damage, incorrect partitioning, or physical disk failure.
Solution: Boot from rescue media, identify the damaged disk, and restore the MBR using dd or a backup image. Example steps:
Attach a new disk and partition it with fdisk -l and fdisk /dev/sdb.
Create a filesystem, mount it, and copy the backup MBR data.
Reboot and verify the system boots normally.
4. GRUB Boot Problems
When the boot process stops at a grub> prompt, the issue is usually a mis‑configured grub.cfg or a missing GRUB file.
Fix for mis‑configuration: Boot from rescue media, mount /boot, edit /boot/grub2/grub.cfg to point to the correct vmlinuz and initrd.img, then reboot.
Fix for missing files: Reinstall GRUB with grub2-install /dev/sda and regenerate the configuration.
5. Forgotten Linux Root Password
Two common recovery methods are:
Boot into rescue mode, chroot into the system, and run passwd root to set a new password.
Boot to the GRUB menu, edit the kernel line to add single, boot into single‑user mode, and use passwd to reset the password.
# chroot /mnt/sysimage
# passwd root
# exit
# reboot6. Read‑Only File System Error
When commands like cp, mv, or chmod fail with "Read‑only file system", possible causes are filesystem damage, disk errors, or an incorrect /etc/fstab entry.
Remedy: If the configuration is correct, remount the filesystem as read‑write: # mount -o rw,remount /system If the filesystem is corrupted, run fsck on the unmounted device:
# nohup fsck -y /dev/VolGroup00/LogVol00 > /dev/shm/fscklog &Hardware failures require disk replacement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
