Mobile Development 18 min read

How to Build a Robust iOS Crash Monitoring System with KSCrash

This article explains the layered iOS exception architecture, common crash causes, and a comprehensive monitoring solution that captures Mach exceptions, Unix signals, runtime NSException and C++ exceptions, and application‑level issues like deadlocks and zombie objects, with detailed implementation steps and code examples.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
How to Build a Robust iOS Crash Monitoring System with KSCrash

Background

After an app is released, crashes that did not appear during offline testing become a major concern. Understanding how crash logs are collected and why crashes happen is essential for reliable iOS development.

Common Crash Causes

Array out‑of‑bounds access

Multithreading issues (UI updates on background threads, data races)

Main‑thread unresponsiveness (Watchdog termination)

Wild pointers (accessing deallocated objects)

iOS Exception Architecture

The iOS exception system is divided into four layers, each responsible for capturing different types of failures.

1. Hardware Layer

CPU exceptions such as illegal instructions or memory‑access errors

2. System Layer

Mach exceptions – the lowest‑level kernel‑level mechanism

Unix signals – Mach exceptions are translated into signals like SIGSEGV or SIGABRT

3. Runtime Layer

NSException (Objective‑C runtime errors like array out‑of‑bounds)

C++ exceptions (thrown by native code, eventually invoking std::terminate)

4. Application Layer

Business‑logic errors and performance problems (deadlocks, memory leaks, zombie objects)

Monitoring Strategy

To achieve complete crash coverage we capture exceptions at three levels:

System‑level: Mach exceptions and Unix signals

Runtime‑level: NSException and C++ terminate handling

Application‑level: proactive checks for deadlocks and zombie objects

Mach Exception Capture

Mach exceptions are intercepted by creating a dedicated exception port, registering it for all exception masks, and running two handler threads (primary and secondary) to guarantee reliability even if the primary thread crashes.

// Create a new exception handling port
mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &g_exceptionPort);
// Insert send right
mach_port_insert_right(mach_task_self(), g_exceptionPort, g_exceptionPort, MACH_MSG_TYPE_MAKE_SEND);
// Register the exception port for all exception types
task_set_exception_ports(mach_task_self(), EXC_MASK_ALL, g_exceptionPort, EXCEPTION_DEFAULT, MACHINE_THREAD_STATE);
// Create primary and secondary handler threads
pthread_create(&g_primaryPThread, &attr, handleExceptions, kThreadPrimary);
pthread_create(&g_secondaryPThread, &attr, handleExceptions, kThreadSecondary);

The handler thread receives messages with mach_msg(), suspends all threads, records machine state, builds a crash context (exception type, registers, stack cursor, address info), generates a JSON report, and finally resumes the threads.

Unix Signal Capture

Unix signals are installed to catch crashes that bypass Mach handling (e.g., abort()). The signal handler receives the signal number, siginfo_t, and CPU context, then forwards the information to the same processing pipeline used for Mach exceptions.

// Install signal handlers for fatal signals
const int *fatal_signals = signal_fatal_signals();
struct sigaction action = {0};
action.sa_flags = SA_SIGINFO | SA_ONSTACK;
action.sa_sigaction = &signal_handle_signals;
sigaction(fatal_signal, &action, &previous_signal_handler);

Runtime Exception Capture

NSException handling is set up by saving the previous handler, installing a custom NSUncaughtExceptionHandler, and invoking the original handler after reporting.

// Save previous handler and set our own
NSUncaughtExceptionHandler *previous_uncaught_exceptionhandler = NSGetUncaughtExceptionHandler();
NSSetUncaughtExceptionHandler(&handle_uncaught_exception);

C++ uncaught exceptions are intercepted by replacing the global terminate handler.

// Save original terminate handler and install our own
std::terminate_handler original_terminate_handler = std::get_terminate();
std::set_terminate(cpp_exception_terminate_handler);

Application‑Level Monitoring

Deadlock Detection : A watchdog thread periodically posts a no‑op block to the main queue and measures the response time. If the main thread does not respond within a configurable timeout, a deadlock is reported.

Zombie Object Detection : The dealloc method of NSObject and NSProxy is hooked. When an object is released, its hash is stored together with class information. Subsequent accesses to the same address are checked against the hash table (size 0x8000) to identify zombie accesses.

Symbolication

Collected stack addresses are converted to human‑readable symbols. Two approaches are used:

Runtime symbolication via dladdr() to obtain image base, image name, symbol address, and symbol name.

Full symbolication using dSYM files to map addresses to source file and line number.

Because the stack stores return addresses, the address is adjusted (e.g., for ARM64: (return_address & ~3UL) - 1) before passing to dladdr().

Async Safety

All code executed inside Mach or signal handlers must be async‑safe. Functions that may allocate memory, acquire locks, or perform I/O (e.g., malloc(), free(), NSLog(), printf(), Objective‑C method calls) are prohibited because the process state may be inconsistent.

Conclusion and Outlook

The article presents a complete iOS crash‑monitoring solution based on KSCrash, covering exception hierarchy, capture mechanisms, implementation details, and best practices such as async safety and symbolication. The solution is already used in Alibaba Cloud RUM iOS SDK and can be extended to support real‑time upload, log collection, and memory dump for deeper analysis.

Background illustration
Background illustration
iOS exception hierarchy diagram
iOS exception hierarchy diagram
Mach exception handling flow
Mach exception handling flow
Unix signal handling flow
Unix signal handling flow
Stack frame layout on ARM64
Stack frame layout on ARM64
Exception handling thread relationship
Exception handling thread relationship
Zombie detection hash table
Zombie detection hash table
Symbolication flow
Symbolication flow
Async‑safe functions illustration
Async‑safe functions illustration
iOSException Handlingsymbolicationcrash monitoringKSCrashMach Exceptions
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.