Mastering Native Crash Handling on Android: Signals, Stacks, and Debugging Techniques
This article explains how to capture and analyze native crashes on Android by using signal handlers, alternate stacks, and tools like dladdr and ptrace, while also showing how to retrieve Java stack traces and integrate crash information for robust mobile debugging.
Background
On Android, native crashes dominate crash reports. They lack full context, have vague error messages, and are harder to fix than Java crashes, so a robust exception capture component must be able to log logcat and app logs, report crash counts, apply different recovery strategies, and evolve with business needs.
Existing Solutions
The three existing solutions share similar implementation principles on Android; a coffeecatch‑based approach can be improved.
Signal Mechanism
Program Crash
In Unix‑like systems, crashes are caused by programming or hardware errors such as division by zero or invalid memory access.
When an exception occurs, the CPU raises an interrupt and the kernel handles it.
Linux normalizes these interrupts as signals, which can be handled via a signal vector.
A signal is a soft interrupt used for inter‑process messaging.
Signal Flow
When a user‑space function invokes a system call, interrupt, or exception, the process switches to kernel mode. The kernel places the signal in the process’s signal queue and sends an interrupt to bring the process back into kernel mode.
(1) Receiving a Signal
The kernel proxies signal reception, placing the signal in the queue and interrupting the process; the process does not immediately know the signal has arrived.
(2) Detecting a Signal
After entering kernel mode, the process checks for signals either before returning to user mode or when waking from sleep.
(3) Handling a Signal
The kernel copies the current stack to user space, sets the instruction pointer to the handler, and after the handler returns the kernel restores the stack and resumes execution.
(4) Common Signal Types
Capturing Native Crashes
Registering a Signal Handler
The first step is to catch native crashes (e.g., SIGSEGV, SIGBUS) using sigaction() on POSIX systems.
#include <signal.h>
int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);signum : signal number (any except SIGKILL and SIGSTOP).
act : pointer to a sigaction struct describing the handler.
oldact : optional storage for the previous handler.
struct sigaction sa_old;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = my_handler;
sa.sa_flags = SA_SIGINFO;
if (sigaction(sig, &sa, &sa_old) == 0) {
…
}Providing an Alternate Stack
#include <signal.h>
int sigaltstack(const stack_t *ss, stack_t *oss);SIGSEGV often results from stack overflow; using the default stack may corrupt the crash context.
Allocating a separate stack with sigaltstack() ensures the handler runs on a safe stack.
stack_t stack;
memset(&stack, 0, sizeof(stack));
stack.ss_size = SIGSTKSZ;
stack.ss_sp = malloc(stack.ss_size);
stack.ss_flags = 0;
if (stack.ss_sp != NULL && sigaltstack(&stack, NULL) == 0) {
…
}Compatibility with Existing Handlers
static void my_handler(const int code, siginfo_t *const si, void *const sc) {
…
/* Call previous handler. */
old_handler.sa_sigaction(code, si, sc);
}If another component has already installed a handler, sigaction can replace it; saving the old handler and invoking it after custom processing preserves compatibility.
Precautions
Avoiding Deadlocks
Signal handlers must only call async‑signal‑safe functions; otherwise the program may enter undefined behavior. Using alarm() and resetting the signal to default helps prevent deadlock or infinite loops.
static void signal_handler(const int code, siginfo_t *const si, void *const sc) {
signal(code, SIG_DFL);
signal(SIGALRM, SIG_DFL);
(void) alarm(8);
…
}Printing Stack Traces
(1) Child Process
Because of signal‑handler restrictions, a common approach is to fork a child process that uses ptrace to unwind the crashed thread’s stack while the parent waits.
(2) Child Thread
Alternatively, a dedicated thread can be created at initialization and awakened by the signal handler to dump the stack and forward it to Java.
static void nativeInit(JNIEnv* env, jclass javaClass, jstring packageNameStr,
jstring tombstoneFilePathStr, jobject obj) {
…
pthread_t thd;
int ret = pthread_create(&thd, NULL, DumpThreadEntry, NULL);
if (ret) {
qmlog("%s", "pthread_create error");
}
}
void* DumpThreadEntry(void *argv) {
…
while (true) {
waitForSignal();
throw_exception(env);
notifyThrowException();
}
…
}Collecting Crash Information
Signal Code
Logcat prints lines such as signal 11 (SIGSEGV), code 0 (SI_USER), fault addr 0x0. Mapping the code value to a table reveals the crash reason.
Program Counter (PC)
The third argument of the handler contains uc_mcontext, which holds register state including the PC. On x86‑64 the PC is uc_mcontext.gregs[REG_RIP]; on ARM it is uc_mcontext.arm_pc.
Shared Library Name and Offset
Using dladdr() we can obtain the base address of the loaded library and compute the relative offset of the PC to resolve the exact source line.
Dl_info info;
if (dladdr(addr, &info) != 0 && info.dli_fname != NULL) {
const uintptr_t addr_relative = (uintptr_t)addr - (uintptr_t)info.dli_fbase;
…
}Process Memory Layout
Reading /proc/self/maps
Parsing /proc/self/maps reveals the load ranges of each module, allowing us to locate the shared library base address.
Obtaining Java Stack
By retrieving the thread name in the signal handler and passing it to Java, we can ask the Java runtime to dump the corresponding Java stack.
char* getThreadName(pid_t tid) {
if (tid <= 1) return NULL;
char* path = calloc(1, 80);
char* line = calloc(1, THREAD_NAME_LENGTH);
snprintf(path, PATH_MAX, "proc/%d/comm", tid);
FILE* commFile = fopen(path, "r");
if (commFile) {
fgets(line, THREAD_NAME_LENGTH, commFile);
fclose(commFile);
}
free(path);
if (line && line[strlen(line)-1] == '
')
line[strlen(line)-1] = '\0';
return line;
} @Keep
public static Thread getThreadByName(String threadName) {
if (TextUtils.isEmpty(threadName)) return null;
Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for (Thread thread : threadSet) {
if (thread.getName().equals(threadName)) {
return thread;
}
}
return null;
}Result Presentation
The final crash report combines native stack, PC, library name, and Java stack, enabling business‑level handling such as rolling back hot‑patches for specific native crashes.
java.lang.Error: signal 11 (Address not mapped to object) at address 0x0
at dalvik.system.NativeStart.run(Native Method)
Caused by: java.lang.Error: signal 11 (Address not mapped to object) at address 0x0
at /data/app-lib/com.tencent.moai.crashcatcher.demo-1/libQMCrashGenerator.so.0xd8e(dangerousFunction:0x5:0)
…Logcat logs often provide the missing context; for example, a WebView crash was traced to a NullPointerException in Java code that propagated to native code.
Note: This component is not yet publicly released.
Tencent TDS Service
TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
