Fundamentals 32 min read

Understanding Linux Kernel Oops Messages and Debugging Techniques

This article explains how Linux kernel Oops messages are generated, their format, the relationship between BUG, Oops and panic, and provides step‑by‑step debugging methods with code examples for ARM, PowerPC and MIPS architectures.

Deepin Linux

Jan 29, 2024

Understanding Linux Kernel Oops Messages and Debugging Techniques

Preface: When using Linux, the system may crash and display an "Oops" message, which is the kernel‑level equivalent of a segmentation fault and indicates a kernel error such as an illegal address access.

1. Overview

Oops is produced when the processor, working with virtual addresses, cannot map an illegal pointer to a physical address, causing a page fault. If the fault occurs in supervisor mode, the kernel prints an Oops.

Oops information source and format: The term "Oops" means "surprise"; when the kernel encounters an error it prints an Oops containing several parts:

A textual description, e.g., "Unable to handle kernel NULL pointer dereference at virtual address 00000000".

An occurrence number, e.g., "Internal error: Oops: 805 [#1]".

Names of loaded modules, if any, prefixed by "Modules linked in:".

CPU number (0 for single‑processor systems).

Current process name and PID, indicating which process was running when the fault occurred.

Stack trace.

Machine code around the offending instruction (the instruction itself is shown in parentheses).

Code: e24cb004 e24dd010 e59f34e0 e3a07000 (e5873000)

Stack back‑trace information shows the function call chain.

Backtrace:
[<c001a6f4>] (s3c2410fb_probe+0x0/0x560) from [<c01bf4e8>] (platform_drv_probe+0x20/0x24)
...

Making Oops Stack Traces More Readable

Linux 2.6.22 includes debugging options to print clearer Oops information. Enabling the frame‑pointer option (CONFIG_FRAME_POINTER) during kernel compilation adds stack back‑trace data. Verify the option is set in .config or enable it via make menuconfig under "Kernel hacking" → "Compile the kernel with debug info".

After enabling, rebuild the kernel with:

make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- menuconfig
make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi-

Use arm-none-linux-gnueabi-gdb to inspect the Oops, e.g., locate the instruction address shown by the LR field.

2. Kernel Exceptions

Kernel exceptions are classified into three levels: BUG, Oops, and panic.

2.1 BUG

BUG is a kernel‑level assertion that aborts execution when a fatal logic error is detected. The source shows the definition of BUG for ARM64 and generic architectures, which ultimately calls panic() after printing a message.

arch/arm64/include/asm/bug.h
#ifndef _ARCH_ARM64_ASM_BUG_H
#define _ARCH_ARM64_ASM_BUG_H
#include <linux/stringify.h>
#include <asm/asm-bug.h>
#define __BUG_FLAGS(flags) \
    asm volatile (__stringify(ASM_BUG_FLAGS(flags)));

#define BUG() do { \
    __BUG_FLAGS(0); \
    unreachable(); \
} while (0)
#define __WARN_FLAGS(flags) __BUG_FLAGS(BUGFLAG_WARNING|(flags))
#define HAVE_ARCH_BUG
#include <asm-generic/bug.h>
#endif /* ! _ARCH_ARM64_ASM_BUG_H */

For non‑ARM64 kernels the generic definition prints a message and calls panic("BUG!").

include/asm-generic/bug.h
#ifndef HAVE_ARCH_BUG
#define BUG() do { \
    printk("BUG: failure at %s:%d/%s()!
", __FILE__, __LINE__, __func__); \
    barrier_before_unreachable(); \
    panic("BUG!"); \
} while (0)
#endif
#ifndef HAVE_ARCH_BUG_ON
#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
#endif

2.2 Oops

When the kernel encounters an exception such as an illegal pointer dereference, it prints an Oops containing the error reason, CPU state, instruction address, register dump, and call trace. The severity determines whether the offending process is killed or the whole system is halted.

arch/arm64/mm/fault.c
static void die_kernel_fault(const char *msg, unsigned long addr,
                 unsigned int esr, struct pt_regs *regs)
{
    bust_spinlocks(1);
    pr_alert("Unable to handle kernel %s at virtual address %016lx
", msg, addr);
    mem_abort_decode(esr);
    show_pte(addr);
    die("Oops", regs, esr);
    bust_spinlocks(0);
    do_exit(SIGKILL);
}

The die() function invokes oops_enter(), prints module information, registers, and a stack dump, then calls oops_exit(). If the system is in an interrupt context or panic_on_oops is enabled, it proceeds to panic().

2.3 die()

arch/arm64/kernel/traps.c
static DEFINE_RAW_SPINLOCK(die_lock);

void die(const char *str, struct pt_regs *regs, int err)
{
    int ret;
    unsigned long flags;
    raw_spin_lock_irqsave(&die_lock, flags);
    oops_enter();
    console_verbose();
    bust_spinlocks(1);
    ret = __die(str, err, regs);
    if (in_interrupt())
        panic("Fatal exception in interrupt");
    if (panic_on_oops)
        panic("Fatal exception");
    raw_spin_unlock_irqrestore(&die_lock, flags);
    if (ret != NOTIFY_STOP)
        do_exit(SIGSEGV);
}

static int __die(const char *str, int err, struct pt_regs *regs)
{
    static int die_counter;
    pr_emerg("Internal error: %s: %x [#%d]
", str, err, ++die_counter);
    notify_die(DIE_OOPS, str, regs, err, 0, SIGSEGV);
    print_modules();
    show_regs(regs);
    dump_kernel_instr(KERN_EMERG, regs);
    return 0;
}

notify_die()

walks the die_chain notifier list, allowing interested modules to react to the Oops.

kernel/notifier.c
static ATOMIC_NOTIFIER_HEAD(die_chain);

int notrace notify_die(enum die_val val, const char *str,
               struct pt_regs *regs, long err, int trap, int sig)
{
    struct die_args args = {
        .regs = regs,
        .str = str,
        .err = err,
        .trapnr = trap,
        .signr = sig,
    };
    return atomic_notifier_call_chain(&die_chain, val, &args);
}

int register_die_notifier(struct notifier_block *nb)
{
    vmalloc_sync_mappings();
    return atomic_notifier_chain_register(&die_chain, nb);
}

2.4 panic

When a fatal error cannot be recovered, the kernel calls panic(), which prints a message, flushes logs, optionally triggers a kexec crash dump, and finally halts or reboots the system.

kernel/panic.c
void panic(const char *fmt, ...)
{
    static char buf[1024];
    va_list args;
    local_irq_disable();
    preempt_disable_notrace();
    console_verbose();
    bust_spinlocks(1);
    va_start(args, fmt);
    vscnprintf(buf, sizeof(buf), fmt, args);
    va_end(args);
    pr_emerg("Kernel panic - not syncing: %s
", buf);
    // optional stack dump, kexec, notifier calls, etc.
    while (1) {
        touch_softlockup_watchdog();
        mdelay(PANIC_TIMER_STEP);
    }
}
EXPORT_SYMBOL(panic);

The panic behavior can be tuned via /proc/sys/kernel/panic_on_oops and /proc/sys/kernel/panic (timeout before reboot).

3. Common Linux Kernel Oops Analysis Methods

Check whether the faulting address is near zero (null‑pointer dereference).

Determine if the address lies in the I/O memory region (possible bus error).

Verify whether the address is close to the stack pointer (possible stack overflow).

Compare stack traces from different watchdogs to see if the PC is moving (detect deadlock).

Use SysRq to distinguish real hangs from soft‑locks.

Disassemble the kernel image (vmlinux) to map the offending instruction back to C code and search for upstream patches.

3.1 PowerPC Example

Sample Oops output shows a bad address access, registers, and call trace. The analysis identifies the offending function free_block() called from drain_array(), with suspicious arguments indicating a corrupted stack.

Unable to handle kernel paging request for data at address 0x36fef31e
Faulting instruction address: 0xc0088b8c
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP: C0088B8C LR: C0088CF8 CTR: 00000000
...
Call Trace:
[CE283ED0] [C06ABEC0] 0xc06abec0(unreliable)
[CE283EF0] [C0088CF8] drain_array+0xc4/0x100
[CE283F10] [C008A70C] cache_reap+0x94/0x13c
...

The register dump explains NIP (next instruction pointer), LR (link register), and CTR (count register). The PowerPC ABI register usage is also described.

3.2 MIPS Example

The MIPS Oops shows a BadVA of 0x00000008, indicating an illegal address access. The stack trace points to _bcore_cleanup() where a null pointer bde is dereferenced.

0:Oops[#1]:
...
0:epc   : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190
0:ra    : ffffffffc087bf40 _init+0x3e8/0x480
...
Code: ffbf0028 0000802d 663142b8 <dc420008> 0040f809 ...

The analysis maps the offending instruction 0<dc420008> to the line for (unit = 0; unit < bde->num_devices(...); unit++), confirming that bde is null.

Both examples demonstrate the typical workflow: locate the faulting address, examine registers, decode the instruction, trace the call stack, and finally fix the root cause in the source code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

debugging Linux Arm MIPS PowerPC Oops

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.