Understanding Linux Kernel Oops, BUG, and Panic: Debugging Techniques and Tools
This article explains Linux kernel crash diagnostics, covering Oops messages, BUG and BUG_ON macros, the die() and panic() pathways, preparation steps for reproducing bugs, kernel configuration options for debugging, and useful memory‑debugging utilities such as MEMWATCH, YAMD, Electric Fence and strace.
When a Linux kernel crashes, it often prints an "Oops" message, which is the kernel‑level equivalent of a segmentation fault and provides details such as the faulting address, register state, and call stack.
1. Preparation Before Debugging
Identify a confirmed bug and the kernel version that introduced it (binary search can help).
Deeply understand the relevant kernel code and ensure the bug is reproducible.
Minimize the system to eliminate unrelated factors.
2. Kernel Exceptions
2.1 BUG
BUG() and BUG_ON() are kernel assertions that trigger an Oops or panic when a fatal condition is detected. On arm64 they invoke #define BUG() do { __BUG_FLAGS(0); unreachable(); } while (0) , causing a panic; on 32‑bit ARM they execute an undefined instruction to raise an exception.
arch/arm64/include/asm/bug.h
#ifndef _ARCH_ARM64_ASM_BUG_H
#define _ARCH_ARM64_ASM_BUG_H
#include <linux/stringify.h>
#include <asm/asm-bug.h>
#define __BUG_FLAGS(flags) \
asm volatile (__stringify(ASM_BUG_FLAGS(flags)));
#define BUG() do { \
__BUG_FLAGS(0); \
unreachable(); \
} while (0)
#define __WARN_FLAGS(flags) __BUG_FLAGS(BUGFLAG_WARNING|(flags))
#define HAVE_ARCH_BUG
#include <asm-generic/bug.h>
#endif /* ! _ARCH_ARM64_ASM_BUG_H */When HAVE_ARCH_BUG is not defined, the generic implementation prints a message and calls panic("BUG!") .
2.2 Oops
An Oops records the CPU state, faulting instruction, and stack trace, then either kills the offending process or, if configured, proceeds to panic. The die() function drives Oops handling:
arch/arm64/kernel/traps.c
static DEFINE_RAW_SPINLOCK(die_lock);
void die(const char *str, struct pt_regs *regs, int err)
{
unsigned long flags;
raw_spin_lock_irqsave(&die_lock, flags);
oops_enter();
console_verbose();
bust_spinlocks(1);
__die(str, err, regs);
// ... additional handling ...
raw_spin_unlock_irqrestore(&die_lock, flags);
if (ret != NOTIFY_STOP)
do_exit(SIGSEGV);
}__die() prints the internal error, notifies interested modules, dumps registers, and shows the offending instruction.
2.3 panic
A panic indicates a fatal kernel error that cannot be recovered. The panic() function disables interrupts, ensures only one CPU runs the panic path, prints the panic message, optionally triggers a kexec crash dump, and finally reboots or halts the system.
kernel/panic.c
void panic(const char *fmt, ...)
{
local_irq_disable();
preempt_disable_notrace();
// ... format message ...
pr_emerg("Kernel panic - not syncing: %s\n", buf);
// optional kexec dump, notify chain, flush logs, etc.
emergency_restart();
}3. Kernel Debug Configuration Options
Enabling Kernel hacking options such as slab layer debugging, CONFIG_PREEMPT, and dynamic debug (CONFIG_DYNAMIC_DEBUG) provides extensive diagnostic output. Options can be set via make menuconfig under Kernel hacking → menu.
4. Memory Debugging Tools
4.1 MEMWATCH
MEMWATCH tracks allocations, double frees, and leaks. Example usage:
#include <stdlib.h>
#include <stdio.h>
#include "memwatch.h"
int main(void)
{
char *ptr1 = malloc(512);
char *ptr2 = malloc(512);
ptr2 = ptr1; // leak the second block
free(ptr2);
free(ptr1);
return 0;
}4.2 YAMD
YAMD (Yet Another Memory Debugger) detects leaks, double frees, and out‑of‑bounds writes. It requires compiling with -g and running the binary through run‑yamd .
4.3 Electric Fence
Electric Fence replaces malloc with guarded allocations that cause an immediate segmentation fault on buffer overrun, making it easy to locate the offending line with gdb.
4.4 strace
strace records system calls made by a user‑space program, useful for diagnosing issues such as failed ioctl calls during filesystem creation.
5. Additional Notes
Use early_printk() for early‑boot messages before the console is initialized.
Adjust kernel log levels via /proc/sys/kernel/printk to control verbosity.
Dynamic debug allows enabling/disabling pr_debug() messages at runtime by writing patterns to /sys/kernel/debug/dynamic_debug/control .
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.