How to Build a Robust iOS Crash Monitoring System with KSCrash
This article explains the layered iOS exception architecture, common crash causes, and a comprehensive monitoring solution that captures Mach exceptions, Unix signals, runtime NSException and C++ exceptions, and application‑level issues like deadlocks and zombie objects, with detailed implementation steps and code examples.
Background
After an app is released, crashes that did not appear during offline testing become a major concern. Understanding how crash logs are collected and why crashes happen is essential for reliable iOS development.
Common Crash Causes
Array out‑of‑bounds access
Multithreading issues (UI updates on background threads, data races)
Main‑thread unresponsiveness (Watchdog termination)
Wild pointers (accessing deallocated objects)
iOS Exception Architecture
The iOS exception system is divided into four layers, each responsible for capturing different types of failures.
1. Hardware Layer
CPU exceptions such as illegal instructions or memory‑access errors
2. System Layer
Mach exceptions – the lowest‑level kernel‑level mechanism
Unix signals – Mach exceptions are translated into signals like SIGSEGV or SIGABRT
3. Runtime Layer
NSException (Objective‑C runtime errors like array out‑of‑bounds)
C++ exceptions (thrown by native code, eventually invoking std::terminate)
4. Application Layer
Business‑logic errors and performance problems (deadlocks, memory leaks, zombie objects)
Monitoring Strategy
To achieve complete crash coverage we capture exceptions at three levels:
System‑level: Mach exceptions and Unix signals
Runtime‑level: NSException and C++ terminate handling
Application‑level: proactive checks for deadlocks and zombie objects
Mach Exception Capture
Mach exceptions are intercepted by creating a dedicated exception port, registering it for all exception masks, and running two handler threads (primary and secondary) to guarantee reliability even if the primary thread crashes.
// Create a new exception handling port
mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &g_exceptionPort);
// Insert send right
mach_port_insert_right(mach_task_self(), g_exceptionPort, g_exceptionPort, MACH_MSG_TYPE_MAKE_SEND); // Register the exception port for all exception types
task_set_exception_ports(mach_task_self(), EXC_MASK_ALL, g_exceptionPort, EXCEPTION_DEFAULT, MACHINE_THREAD_STATE); // Create primary and secondary handler threads
pthread_create(&g_primaryPThread, &attr, handleExceptions, kThreadPrimary);
pthread_create(&g_secondaryPThread, &attr, handleExceptions, kThreadSecondary);The handler thread receives messages with mach_msg(), suspends all threads, records machine state, builds a crash context (exception type, registers, stack cursor, address info), generates a JSON report, and finally resumes the threads.
Unix Signal Capture
Unix signals are installed to catch crashes that bypass Mach handling (e.g., abort()). The signal handler receives the signal number, siginfo_t, and CPU context, then forwards the information to the same processing pipeline used for Mach exceptions.
// Install signal handlers for fatal signals
const int *fatal_signals = signal_fatal_signals();
struct sigaction action = {0};
action.sa_flags = SA_SIGINFO | SA_ONSTACK;
action.sa_sigaction = &signal_handle_signals;
sigaction(fatal_signal, &action, &previous_signal_handler);Runtime Exception Capture
NSException handling is set up by saving the previous handler, installing a custom NSUncaughtExceptionHandler, and invoking the original handler after reporting.
// Save previous handler and set our own
NSUncaughtExceptionHandler *previous_uncaught_exceptionhandler = NSGetUncaughtExceptionHandler();
NSSetUncaughtExceptionHandler(&handle_uncaught_exception);C++ uncaught exceptions are intercepted by replacing the global terminate handler.
// Save original terminate handler and install our own
std::terminate_handler original_terminate_handler = std::get_terminate();
std::set_terminate(cpp_exception_terminate_handler);Application‑Level Monitoring
Deadlock Detection : A watchdog thread periodically posts a no‑op block to the main queue and measures the response time. If the main thread does not respond within a configurable timeout, a deadlock is reported.
Zombie Object Detection : The dealloc method of NSObject and NSProxy is hooked. When an object is released, its hash is stored together with class information. Subsequent accesses to the same address are checked against the hash table (size 0x8000) to identify zombie accesses.
Symbolication
Collected stack addresses are converted to human‑readable symbols. Two approaches are used:
Runtime symbolication via dladdr() to obtain image base, image name, symbol address, and symbol name.
Full symbolication using dSYM files to map addresses to source file and line number.
Because the stack stores return addresses, the address is adjusted (e.g., for ARM64: (return_address & ~3UL) - 1) before passing to dladdr().
Async Safety
All code executed inside Mach or signal handlers must be async‑safe. Functions that may allocate memory, acquire locks, or perform I/O (e.g., malloc(), free(), NSLog(), printf(), Objective‑C method calls) are prohibited because the process state may be inconsistent.
Conclusion and Outlook
The article presents a complete iOS crash‑monitoring solution based on KSCrash, covering exception hierarchy, capture mechanisms, implementation details, and best practices such as async safety and symbolication. The solution is already used in Alibaba Cloud RUM iOS SDK and can be extended to support real‑time upload, log collection, and memory dump for deeper analysis.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
