How to Diagnose and Fix Common Linux System Failures
This guide walks through typical Linux operational problems—including boot failures, network issues, MBR and GRUB errors, forgotten root passwords, and read‑only file‑system symptoms—explaining their causes, step‑by‑step diagnostic methods, and practical recovery commands to restore a healthy system.
1. Linux System Boot Failure
Boot problems often stem from misconfigured system files, filesystem corruption, missing kernel files, or hardware faults. The most common cause is an incorrect
/etc/fstabthat prevents the system from mounting essential partitions.
Cause 1: Wrong or missing entries in
/etc/fstab. Diagnosis: The system stops after "starting system logger". Solution: Restore the
/etc/fstabfile using a rescue environment and rebuild the file with correct mount points.
Cause 2: Filesystem inconsistency after sudden power loss, especially on ext3/ext4 with journaling. Diagnosis: Error messages such as "checking root filesystem" and "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY" appear during boot. Solution: Unmount the affected partition and run
fsckto repair.
<code># umount /dev/sdb5
# fsck.ext3 -y /dev/sdb5</code>Cause 3: Missing or corrupted kernel files in the
/bootpartition (e.g.,
vmlinuzor
initrd.img). Solution: Boot from rescue media, mount
/boot, copy the missing files from a backup or installation media, and update
grub.cfgaccordingly.
Cause 4: Hardware failures such as faulty motherboard, power supply, or disk. Solution: Replace the defective hardware component.
2. Linux Network Faults
Network issues are usually caused by hardware problems, misconfigured interfaces, or incorrect routing tables.
Step 1: Verify physical hardware (NIC, cable, switch, router). Replace any faulty component.
Step 2: Check that the NIC driver is loaded using
ifconfigor
ip addr. Use
ethtoolto inspect link speed and status.
Step 3: Ensure IP settings are correct and do not conflict with other hosts.
Step 4: Examine the routing table with
route -nor
ip route. Remove incorrect default routes and add the proper ones, e.g.:
<code># route delete default
# route add default gw 10.10.1.254</code>Step 5: Test DNS resolution by checking
/etc/host.confand
/etc/nsswitch.conf. The typical configuration is
order hosts,bindand
hosts: files dns.
Step 6: Verify that required services (e.g., SSH on port 22) are listening using
telnetor
netstat -tlnp.
3. MBR Sector Failure
Symptoms include "Operating system not found" or a black screen after power‑on. Causes are virus damage, incorrect partitioning, or physical disk failure.
Solution: Boot from rescue media, identify the damaged disk, and restore the MBR using
ddor a backup image. Example steps:
Attach a new disk and partition it with
fdisk -land
fdisk /dev/sdb.
Create a filesystem, mount it, and copy the backup MBR data.
Reboot and verify the system boots normally.
4. GRUB Boot Problems
When the boot process stops at a
grub>prompt, the issue is usually a mis‑configured
grub.cfgor a missing GRUB file.
Fix for mis‑configuration: Boot from rescue media, mount
/boot, edit
/boot/grub2/grub.cfgto point to the correct
vmlinuzand
initrd.img, then reboot.
Fix for missing files: Reinstall GRUB with
grub2-install /dev/sdaand regenerate the configuration.
5. Forgotten Linux Root Password
Two common recovery methods are:
Boot into rescue mode, chroot into the system, and run
passwd rootto set a new password.
Boot to the GRUB menu, edit the kernel line to add
single, boot into single‑user mode, and use
passwdto reset the password.
<code># chroot /mnt/sysimage
# passwd root
# exit
# reboot</code>6. Read‑Only File System Error
When commands like
cp,
mv, or
chmodfail with "Read‑only file system", possible causes are filesystem damage, disk errors, or an incorrect
/etc/fstabentry.
Remedy: If the configuration is correct, remount the filesystem as read‑write:
<code># mount -o rw,remount /system</code>If the filesystem is corrupted, run
fsckon the unmounted device:
<code># nohup fsck -y /dev/VolGroup00/LogVol00 > /dev/shm/fscklog &</code>Hardware failures require disk replacement.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.