Operations 37 min read

Master Core Dumps to Diagnose Linux Crashes Like a Pro

This guide explains what a Core Dump is, why it is essential for Linux crash analysis, how the kernel creates it, common signals that trigger it, typical causes of crashes, and step‑by‑step configuration and GDB techniques to capture and debug Core Dump files effectively.

Deepin Linux
Deepin Linux
Deepin Linux
Master Core Dumps to Diagnose Linux Crashes Like a Pro

What Is a Core Dump?

In Linux, a Core Dump is a snapshot of a process’s memory at the moment of a fatal crash. It records the code segment, stack data, register state, and the function call chain, allowing developers to reconstruct the exact execution context that led to the failure.

The Core Dump file follows the ELF format and consists of four parts: the ELF header, program header table, NOTE segment, and LOAD segment. The ELF header describes basic metadata such as file type and architecture. The program header table indexes each segment’s layout. The NOTE segment stores auxiliary metadata (e.g., signal information, PID, timestamps). The LOAD segment contains the actual memory image of the process.

Why Core Dumps Matter

Without a Core Dump, developers must rely on sparse log messages and guesswork, which is inefficient for intermittent bugs such as segmentation faults, wild pointers, or out‑of‑bounds accesses. With a Core Dump, tools like gdb can load the file, show the exact function where the crash occurred, display the call stack, and inspect variable values, dramatically speeding up root‑cause analysis.

How the Kernel Generates a Core Dump

When a fatal signal (e.g., SIGSEGV, SIGABRT, SIGFPE) is delivered, the kernel freezes the process.

The kernel then terminates the process and writes a Core Dump file (usually named core or core.<pid>) to the process’s working directory.

After writing, the kernel releases all resources held by the process.

The following simplified kernel code illustrates the signal‑handling path:

static void do_signal(struct pt_regs *regs) {
    siginfo_t info;
    int signr;
    struct k_sigaction ka;
    signr = get_signal_to_deliver(&info, &ka, regs, NULL);
    if (signr > 0) {
        // Handle the signal, possibly generating a core dump
    }
}

The get_signal_to_deliver function extracts a pending signal from the process’s signal queues and prepares it for handling:

int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
                         struct pt_regs *regs, void *cookie) {
    sigset_t *mask = &current->blocked;
    int signr = 0;
    while ((signr = dequeue_signal(mask, &current->pending)) ||
           (signr = dequeue_signal(mask, &current->shared_pending))) {
        struct sigqueue *q = find_signal_queue(signr, &current->pending);
        if (!q) q = find_signal_queue(signr, &current->shared_pending);
        *info = q->info;
        *return_ka = current->sigaction[signr - 1];
        if (should_generate_coredump(signr)) {
            prepare_coredump();
        }
        return signr;
    }
    return 0;
}

Signals That Trigger Core Dumps

SIGSEGV : Illegal memory access (e.g., null‑pointer dereference, array out‑of‑bounds).

SIGABRT : Called via abort() or failed assert().

SIGFPE : Fatal arithmetic errors such as division by zero.

Example that causes SIGSEGV:

#include <stdio.h>
int main() {
    int arr[10];
    arr[100] = 10; // out‑of‑bounds → SIGSEGV → core dump
    return 0;
}

Example that causes SIGABRT via an assertion:

#include <assert.h>
int main() {
    int num = 0;
    assert(num != 0); // assertion fails → SIGABRT
    abort();           // explicit abort → SIGABRT
    return 0;
}

Example that causes SIGFPE (division by zero):

#include <stdio.h>
int main() {
    int a = 10;
    int b = 0;
    int result = a / b; // division by zero → SIGFPE
    return 0;
}

Common Root Causes of Crashes

Null or wild pointer dereference – accessing memory through an invalid pointer.

Buffer overflow – writing beyond the bounds of an array (e.g., strcpy(buffer, "123456789012345")).

Uncaught signals – default actions terminate the process and generate a dump.

Resource limits – ulimit -c 0 disables core files; insufficient disk space also prevents writing.

Permission issues – the process must have write permission in the target directory.

Multithreading problems – race conditions, deadlocks, or unsynchronized access to shared resources.

Dynamic memory errors – double free, memory leaks that eventually exhaust RAM.

Logical errors – infinite recursion, uncaught exceptions.

Hardware faults – faulty RAM or CPU anomalies can also produce core dumps.

Quick Configuration to Enable Core Dumps

1. Set the core file size limit :

ulimit -c unlimited   # allow unlimited core size for the current shell

Check with ulimit -c – it should output unlimited.

2. Configure the core file naming pattern (e.g., include program name, PID, and timestamp):

echo '/tmp/core.%e.%p.%t' | sudo tee /proc/sys/kernel/core_pattern

Ensure the target directory exists and is writable:

sudo mkdir -p /tmp/coredumps && sudo chmod 1777 /tmp/coredumps

3. Enable useful kernel parameters : <code>echo 1 | sudo tee /proc/sys/kernel/core_uses_pid # include PID in filename echo 2 | sudo tee /proc/sys/kernel/suid_dumpable # allow core dumps for setuid programs</code> 4. Systemd service configuration (if the program runs as a service): <code>[Service] LimitCORE=infinity MemoryLimit=infinity </code> After editing, reload and restart the service: <code>sudo systemctl daemon-reload sudo systemctl restart servicename.service</code> Verifying Core Dump Generation Create a tiny test program that deliberately crashes: <code>#include &lt;stdio.h&gt; int main() { char *p = NULL; return *p; // null‑pointer dereference → SIGSEGV } </code> Compile with debugging symbols: <code>gcc -g -o crash crash.c</code> Run the program and then check for a core file (e.g., /tmp/core.crash.1234 ). Load it with GDB: <code>gdb ./crash /tmp/core.crash.1234</code> Analyzing Core Dumps with GDB bt / bt full – display the call stack, with bt full also showing local variables. frame N – switch to stack frame N for detailed inspection. print – examine the value of a variable or dereference a pointer. list – show the source code around the current execution point. Example session: <code>(gdb) bt #0 0x08048567 in func3 () #1 0x080484e2 in func2 () #2 0x08048476 in func1 () #3 0x080483d9 in main () (gdb) frame 2 (gdb) print variable_name (gdb) list </code> Real‑World Debugging Example A multithreaded file‑processing service crashes under high load. After loading the core file, bt shows the crash inside process_file with several threads on the stack. Using print reveals that multiple threads share a file descriptor without synchronization, leading to race conditions. The fix adds a mutex around file operations: <code>#include &lt;pthread.h&gt; #include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; pthread_mutex_t file_mutex; void process_file(const char *filename) { pthread_mutex_lock(&amp;file_mutex); FILE *file = fopen(filename, "r"); if (file == NULL) { perror("Failed to open file"); pthread_mutex_unlock(&amp;file_mutex); return; } char buffer[1024]; while (fgets(buffer, sizeof(buffer), file) != NULL) { // process each line } fclose(file); pthread_mutex_unlock(&amp;file_mutex); } void *thread_function(void *arg) { const char *filename = (const char *)arg; process_file(filename); return NULL; } int main() { pthread_mutex_init(&amp;file_mutex, NULL); pthread_t t1, t2; const char *f1 = "file1.txt"; const char *f2 = "file2.txt"; pthread_create(&amp;t1, NULL, thread_function, (void *)f1); pthread_create(&amp;t2, NULL, thread_function, (void *)f2); pthread_join(t1, NULL); pthread_join(t2, NULL); pthread_mutex_destroy(&amp;file_mutex); return 0; } </code> Rebuilding and redeploying the service eliminates the crashes, demonstrating how Core Dumps and GDB can pinpoint and resolve complex concurrency bugs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

crash analysiselfgdbsignal handlingcore dumpulimitlinux debugging
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.