Master Core Dumps: From Generation to Debugging with GDB on Linux
This article explains what a Core Dump is, how Linux generates the ELF‑based snapshot when a program crashes, common causes such as memory errors or signal mishandling, essential system configurations, and step‑by‑step GDB techniques for analyzing and fixing the underlying bugs.
Ever encountered a program that runs fine locally but crashes online with a vague "Segmentation fault"? The key to solving such crashes is the Core Dump – a snapshot of a program’s memory, registers, and stack at the moment of failure, allowing precise bug location without guesswork.
1. What is a Core Dump?
1.1 Core Dump file introduction
On Linux, a Core Dump (core file) is created when a process receives a fatal signal (e.g., SIGSEGV, SIGABRT) and the kernel writes the process’s address space and state to disk. The file acts like a "snapshot" of the crash, essential for post‑mortem debugging.
The Core Dump file consists of four parts: ELF header, program header table, NOTE segment, and LOAD segment.
1.2 Kernel view of Core Dump generation
The generation process involves four steps:
Step 1: A fatal signal (e.g., segmentation fault) triggers a hardware exception.
Step 2: The kernel freezes the process, then terminates it and starts writing the Core Dump.
Step 3: The Core Dump file is written, containing the virtual address space, CPU registers, thread info, and signal details.
Step 4: The kernel releases all resources and fully removes the process.
(1) Signal handling phase: do_signal
Before returning to user space, the kernel checks pending signals and calls do_signal to process them.
static void fastcall do_signal(struct pt_regs *regs) {
siginfo_t info;
int signr;
struct k_sigaction ka;
sigset_t *oldset;
// Get the signal to deliver
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
// Signal‑specific handling, possibly generating a Core Dump
}
}If the signal requires a Core Dump, the kernel prepares the dump.
(2) Signal acquisition phase: get_signal_to_deliver
This function extracts a pending signal from the process’s queues and decides whether a Core Dump should be generated.
int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
struct pt_regs *regs, void *cookie) {
sigset_t *mask = ¤t->blocked;
int signr = 0;
while ((signr = dequeue_signal(mask, ¤t->pending)) ||
(signr = dequeue_signal(mask, ¤t->shared_pending))) {
struct sigpending *pending;
struct sigqueue *q;
q = find_signal_queue(signr, ¤t->pending);
if (!q)
q = find_signal_queue(signr, ¤t->shared_pending);
*info = q->info;
*return_ka = current->sigaction[signr - 1];
if (should_generate_coredump(signr)) {
prepare_coredump();
}
return signr;
}
return 0;
}(3) Memory information recording phase
After deciding to generate a dump, the kernel records the process’s memory layout. The Core Dump is an ELF file containing PT_NOTE (registers, task_struct, VMCOREINFO) and PT_LOAD segments (heap, stack, data, etc.).
2. Core Dump generation mechanism
2.1 Trigger conditions
Core Dumps are typically triggered by the following signals:
SIGSEGV (signal 11) : illegal memory access such as null‑pointer dereference.
#include <stdio.h>
#include <stdlib.h>
int main() {
int *ptr = NULL;
*ptr = 10; // triggers SIGSEGV
return 0;
}SIGABRT (signal 6) : abort() or failed assert.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main() {
int num = 0;
assert(num > 0); // triggers SIGABRT
return 0;
}SIGFPE (signal 8) : fatal arithmetic error such as division by zero.
#include <stdio.h>
int main() {
int a = 10;
int b = 0;
int c = a / b; // triggers SIGFPE
return 0;
}2.2 Configuration points
Use ulimit -c unlimited to allow unlimited Core Dump size.
Ensure the program’s working directory is writable.
If the program changes its effective UID/GID, set /proc/sys/fs/suid_dumpable to 1.
Adjust /proc/sys/kernel/core_pattern to control dump location and naming.
3. Common causes of Core Dumps
3.1 Memory access errors
Null or wild pointer dereference – accessing a null pointer triggers SIGSEGV.
#include <iostream>
int main() {
int *ptr = nullptr;
*ptr = 10; // SIGSEGV
return 0;
}Buffer overflow – writing past an array’s bounds can corrupt memory and cause a crash.
#include <stdio.h>
int main() {
char buffer[10];
strcpy(buffer, "123456789012345"); // overflow
return 0;
}3.2 Improper signal handling
SIGSEGV, SIGABRT, SIGFPE – if not caught, the default action is to terminate and dump core.
3.3 Resource limits and configuration
If ulimit -c is 0 or disk space is insufficient, no Core Dump will be written.
3.4 Multithreading issues
Race conditions – unsynchronized access to shared data can lead to crashes.
#include <iostream>
#include <thread>
int sharedVariable = 0;
void increment() {
for (int i = 0; i < 1000; ++i) {
sharedVariable++; // no lock → race
}
}
int main() {
std::thread t1(increment), t2(increment);
t1.join(); t2.join();
std::cout << "Final: " << sharedVariable << std::endl;
return 0;
}Deadlock – two threads waiting on each other’s mutex can stall and eventually crash.
#include <iostream>
#include <thread>
#include <mutex>
std::mutex m1, m2;
void f1() { m1.lock(); std::this_thread::sleep_for(std::chrono::milliseconds(100)); m2.lock(); m2.unlock(); m1.unlock(); }
void f2() { m2.lock(); std::this_thread::sleep_for(std::chrono::milliseconds(100)); m1.lock(); m1.unlock(); m2.unlock(); }
int main() { std::thread t1(f1), t2(f2); t1.join(); t2.join(); return 0; }3.5 Dynamic memory management errors
Double free – freeing the same pointer twice leads to undefined behavior.
#include <stdio.h>
#include <stdlib.h>
int main() {
int *ptr = (int *)malloc(sizeof(int));
free(ptr);
free(ptr); // double free → crash
return 0;
}Memory leak – continuous allocation without release can exhaust memory and cause a crash.
#include <iostream>
void leak() { while (true) { int *p = new int; } }
int main() { leak(); return 0; }3.6 Program logic errors
Infinite recursion – eventually overflows the stack.
#include <stdio.h>
void recur() { recur(); }
int main() { recur(); return 0; }Uncaught exception – propagates to termination.
#include <iostream>
void thrower() { throw 1; }
int main() {
try { thrower(); } catch(...) {}
return 0;
}3.7 Hardware problems
Faulty RAM can cause SIGSEGV‑like crashes.
CPU overheating or defects may also trigger crashes.
4. Core Dump analysis and debugging
4.1 Using GDB to analyze a Core Dump
Load the executable and core file: gdb ./my_program core.1234 Use bt to view the backtrace, p to inspect variables, and info registers to see CPU state.
(gdb) bt
#0 func3 (arg1=0x7fffffffde10, arg2=42) at my_file.c:123
#1 0x00005555555552b5 in func2 (arg=0x7fffffffde10) at main.c:234
#2 0x0000555555555350 in main () at main.c:3454.2 Check system configuration
Ensure ulimit -c is not zero: ulimit -c Set it to unlimited if needed: ulimit -c unlimited Verify write permission on the dump directory, e.g., /var/core:
ls -l /var/core
sudo chmod 777 /var/core4.3 GDB loading Core Dump
gdb my_program core.1234After loading, use where / bt, p, and info registers as needed.
4.4 Useful GDB commands
where/ bt – display call stack. p VAR – print variable value. info registers – show CPU registers at crash.
5. Core Dump practical cases
5.1 Simple case analysis
#include <stdio.h>
#include <stdlib.h>
void func() { int *ptr = NULL; *ptr = 10; }
int main() { func(); return 0; }Compile with -g, run, then analyze with GDB:
gdb test core.12345
(gdb) bt
#0 func () at test.c:5
#1 main () at test.c:9
(gdb) p ptr
$1 = (int *) 0x0The backtrace shows the crash at line 5, and ptr is null.
5.2 Complex multithreaded scenario
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define ARRAY_SIZE 100
int shared_array[ARRAY_SIZE];
pthread_mutex_t mutex;
void *write_thread(void *arg) {
for (int i = 0; i < ARRAY_SIZE; i++) {
pthread_mutex_lock(&mutex);
shared_array[i] = i;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
void *read_thread(void *arg) {
for (int i = 0; i < ARRAY_SIZE; i++) {
pthread_mutex_lock(&mutex);
int value = shared_array[i];
printf("Read value: %d at index %d
", value, i);
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
pthread_t w, r;
pthread_mutex_init(&mutex, NULL);
pthread_create(&w, NULL, write_thread, NULL);
pthread_create(&r, NULL, read_thread, NULL);
pthread_join(w, NULL);
pthread_join(r, NULL);
pthread_mutex_destroy(&mutex);
return 0;
}After a crash, load the core file:
gdb multi_thread_test core.67890
(gdb) info threads
(gdb) thread 2 # switch to the read thread
(gdb) bt
#0 read_thread (arg=0x0) at multi_thread_test.c:20
#1 pthread_mutex_lock ()
#2 main () at multi_thread_test.c:28
(gdb) p i
$1 = 120The index i exceeds ARRAY_SIZE, revealing an out‑of‑bounds read that caused the Core Dump.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
