Operations 18 min read

How to Diagnose and Fix Common Linux System Failures

This guide walks through typical Linux operational problems—including boot failures, network issues, MBR and GRUB errors, forgotten root passwords, and read‑only file‑system symptoms—explaining their causes, step‑by‑step diagnostic methods, and practical recovery commands to restore a healthy system.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose and Fix Common Linux System Failures

1. Linux System Boot Failure

Boot problems often stem from misconfigured system files, filesystem corruption, missing kernel files, or hardware faults. The most common cause is an incorrect

/etc/fstab

that prevents the system from mounting essential partitions.

Cause 1: Wrong or missing entries in

/etc/fstab

. Diagnosis: The system stops after "starting system logger". Solution: Restore the

/etc/fstab

file using a rescue environment and rebuild the file with correct mount points.

Cause 2: Filesystem inconsistency after sudden power loss, especially on ext3/ext4 with journaling. Diagnosis: Error messages such as "checking root filesystem" and "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY" appear during boot. Solution: Unmount the affected partition and run

fsck

to repair.

<code># umount /dev/sdb5
# fsck.ext3 -y /dev/sdb5</code>

Cause 3: Missing or corrupted kernel files in the

/boot

partition (e.g.,

vmlinuz

or

initrd.img

). Solution: Boot from rescue media, mount

/boot

, copy the missing files from a backup or installation media, and update

grub.cfg

accordingly.

Cause 4: Hardware failures such as faulty motherboard, power supply, or disk. Solution: Replace the defective hardware component.

2. Linux Network Faults

Network issues are usually caused by hardware problems, misconfigured interfaces, or incorrect routing tables.

Step 1: Verify physical hardware (NIC, cable, switch, router). Replace any faulty component.

Step 2: Check that the NIC driver is loaded using

ifconfig

or

ip addr

. Use

ethtool

to inspect link speed and status.

Step 3: Ensure IP settings are correct and do not conflict with other hosts.

Step 4: Examine the routing table with

route -n

or

ip route

. Remove incorrect default routes and add the proper ones, e.g.:

<code># route delete default
# route add default gw 10.10.1.254</code>

Step 5: Test DNS resolution by checking

/etc/host.conf

and

/etc/nsswitch.conf

. The typical configuration is

order hosts,bind

and

hosts: files dns

.

Step 6: Verify that required services (e.g., SSH on port 22) are listening using

telnet

or

netstat -tlnp

.

3. MBR Sector Failure

Symptoms include "Operating system not found" or a black screen after power‑on. Causes are virus damage, incorrect partitioning, or physical disk failure.

Solution: Boot from rescue media, identify the damaged disk, and restore the MBR using

dd

or a backup image. Example steps:

Attach a new disk and partition it with

fdisk -l

and

fdisk /dev/sdb

.

Create a filesystem, mount it, and copy the backup MBR data.

Reboot and verify the system boots normally.

4. GRUB Boot Problems

When the boot process stops at a

grub&gt;

prompt, the issue is usually a mis‑configured

grub.cfg

or a missing GRUB file.

Fix for mis‑configuration: Boot from rescue media, mount

/boot

, edit

/boot/grub2/grub.cfg

to point to the correct

vmlinuz

and

initrd.img

, then reboot.

Fix for missing files: Reinstall GRUB with

grub2-install /dev/sda

and regenerate the configuration.

5. Forgotten Linux Root Password

Two common recovery methods are:

Boot into rescue mode, chroot into the system, and run

passwd root

to set a new password.

Boot to the GRUB menu, edit the kernel line to add

single

, boot into single‑user mode, and use

passwd

to reset the password.

<code># chroot /mnt/sysimage
# passwd root
# exit
# reboot</code>

6. Read‑Only File System Error

When commands like

cp

,

mv

, or

chmod

fail with "Read‑only file system", possible causes are filesystem damage, disk errors, or an incorrect

/etc/fstab

entry.

Remedy: If the configuration is correct, remount the filesystem as read‑write:

<code># mount -o rw,remount /system</code>

If the filesystem is corrupted, run

fsck

on the unmounted device:

<code># nohup fsck -y /dev/VolGroup00/LogVol00 &gt; /dev/shm/fscklog &amp;</code>

Hardware failures require disk replacement.

network troubleshootingLinuxTroubleshootingMBRSystem AdministrationGRUBroot-passwordboot issues
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.