Mobile Development 24 min read

Mastering Native Crash Handling on Android: Signals, Stacks, and Debugging Techniques

This article explains how to capture and analyze native crashes on Android by using signal handlers, alternate stacks, and tools like dladdr and ptrace, while also showing how to retrieve Java stack traces and integrate crash information for robust mobile debugging.

Tencent TDS Service
Tencent TDS Service
Tencent TDS Service
Mastering Native Crash Handling on Android: Signals, Stacks, and Debugging Techniques

Background

On Android, native crashes dominate crash reports. They lack full context, have vague error messages, and are harder to fix than Java crashes, so a robust exception capture component must be able to log logcat and app logs, report crash counts, apply different recovery strategies, and evolve with business needs.

Existing Solutions

The three existing solutions share similar implementation principles on Android; a coffeecatch‑based approach can be improved.

Signal Mechanism

Program Crash

In Unix‑like systems, crashes are caused by programming or hardware errors such as division by zero or invalid memory access.

When an exception occurs, the CPU raises an interrupt and the kernel handles it.

Linux normalizes these interrupts as signals, which can be handled via a signal vector.

A signal is a soft interrupt used for inter‑process messaging.

Signal Flow

When a user‑space function invokes a system call, interrupt, or exception, the process switches to kernel mode. The kernel places the signal in the process’s signal queue and sends an interrupt to bring the process back into kernel mode.

(1) Receiving a Signal

The kernel proxies signal reception, placing the signal in the queue and interrupting the process; the process does not immediately know the signal has arrived.

(2) Detecting a Signal

After entering kernel mode, the process checks for signals either before returning to user mode or when waking from sleep.

(3) Handling a Signal

The kernel copies the current stack to user space, sets the instruction pointer to the handler, and after the handler returns the kernel restores the stack and resumes execution.

(4) Common Signal Types

Capturing Native Crashes

Registering a Signal Handler

The first step is to catch native crashes (e.g., SIGSEGV, SIGBUS) using sigaction() on POSIX systems.

#include <signal.h>
int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);

signum : signal number (any except SIGKILL and SIGSTOP).

act : pointer to a sigaction struct describing the handler.

oldact : optional storage for the previous handler.

struct sigaction sa_old;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = my_handler;
sa.sa_flags = SA_SIGINFO;
if (sigaction(sig, &sa, &sa_old) == 0) {
    …
}

Providing an Alternate Stack

#include <signal.h>
int sigaltstack(const stack_t *ss, stack_t *oss);

SIGSEGV often results from stack overflow; using the default stack may corrupt the crash context.

Allocating a separate stack with sigaltstack() ensures the handler runs on a safe stack.

stack_t stack;
memset(&stack, 0, sizeof(stack));
stack.ss_size = SIGSTKSZ;
stack.ss_sp = malloc(stack.ss_size);
stack.ss_flags = 0;
if (stack.ss_sp != NULL && sigaltstack(&stack, NULL) == 0) {
    …
}

Compatibility with Existing Handlers

static void my_handler(const int code, siginfo_t *const si, void *const sc) {
    …
    /* Call previous handler. */
    old_handler.sa_sigaction(code, si, sc);
}

If another component has already installed a handler, sigaction can replace it; saving the old handler and invoking it after custom processing preserves compatibility.

Precautions

Avoiding Deadlocks

Signal handlers must only call async‑signal‑safe functions; otherwise the program may enter undefined behavior. Using alarm() and resetting the signal to default helps prevent deadlock or infinite loops.

static void signal_handler(const int code, siginfo_t *const si, void *const sc) {
    signal(code, SIG_DFL);
    signal(SIGALRM, SIG_DFL);
    (void) alarm(8);
    …
}

Printing Stack Traces

(1) Child Process

Because of signal‑handler restrictions, a common approach is to fork a child process that uses ptrace to unwind the crashed thread’s stack while the parent waits.

(2) Child Thread

Alternatively, a dedicated thread can be created at initialization and awakened by the signal handler to dump the stack and forward it to Java.

static void nativeInit(JNIEnv* env, jclass javaClass, jstring packageNameStr,
                       jstring tombstoneFilePathStr, jobject obj) {
    …
    pthread_t thd;
    int ret = pthread_create(&thd, NULL, DumpThreadEntry, NULL);
    if (ret) {
        qmlog("%s", "pthread_create error");
    }
}
void* DumpThreadEntry(void *argv) {
    …
    while (true) {
        waitForSignal();
        throw_exception(env);
        notifyThrowException();
    }
    …
}

Collecting Crash Information

Signal Code

Logcat prints lines such as signal 11 (SIGSEGV), code 0 (SI_USER), fault addr 0x0. Mapping the code value to a table reveals the crash reason.

Program Counter (PC)

The third argument of the handler contains uc_mcontext, which holds register state including the PC. On x86‑64 the PC is uc_mcontext.gregs[REG_RIP]; on ARM it is uc_mcontext.arm_pc.

Shared Library Name and Offset

Using dladdr() we can obtain the base address of the loaded library and compute the relative offset of the PC to resolve the exact source line.

Dl_info info;
if (dladdr(addr, &info) != 0 && info.dli_fname != NULL) {
    const uintptr_t addr_relative = (uintptr_t)addr - (uintptr_t)info.dli_fbase;
    …
}

Process Memory Layout

Reading /proc/self/maps

Parsing /proc/self/maps reveals the load ranges of each module, allowing us to locate the shared library base address.

Obtaining Java Stack

By retrieving the thread name in the signal handler and passing it to Java, we can ask the Java runtime to dump the corresponding Java stack.

char* getThreadName(pid_t tid) {
    if (tid <= 1) return NULL;
    char* path = calloc(1, 80);
    char* line = calloc(1, THREAD_NAME_LENGTH);
    snprintf(path, PATH_MAX, "proc/%d/comm", tid);
    FILE* commFile = fopen(path, "r");
    if (commFile) {
        fgets(line, THREAD_NAME_LENGTH, commFile);
        fclose(commFile);
    }
    free(path);
    if (line && line[strlen(line)-1] == '
')
        line[strlen(line)-1] = '\0';
    return line;
}
@Keep
public static Thread getThreadByName(String threadName) {
    if (TextUtils.isEmpty(threadName)) return null;
    Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
    for (Thread thread : threadSet) {
        if (thread.getName().equals(threadName)) {
            return thread;
        }
    }
    return null;
}

Result Presentation

The final crash report combines native stack, PC, library name, and Java stack, enabling business‑level handling such as rolling back hot‑patches for specific native crashes.

java.lang.Error: signal 11 (Address not mapped to object) at address 0x0
    at dalvik.system.NativeStart.run(Native Method)
Caused by: java.lang.Error: signal 11 (Address not mapped to object) at address 0x0
    at /data/app-lib/com.tencent.moai.crashcatcher.demo-1/libQMCrashGenerator.so.0xd8e(dangerousFunction:0x5:0)
    …

Logcat logs often provide the missing context; for example, a WebView crash was traced to a NullPointerException in Java code that propagated to native code.

Note: This component is not yet publicly released.

mobile developmentsignal-handlingNative Crash
Tencent TDS Service
Written by

Tencent TDS Service

TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.