iOS App Startup Optimization: Process, Measurement, and Practical Solutions
This article explains how iOS app launch works, presents methods for measuring startup time using system calls and Xcode Instruments, shares a detailed case study of the Maimai app, and offers concrete pre‑main and post‑main optimization techniques—including binary reordering, dynamic‑library reduction, concurrency control, and unconventional tricks—to achieve sub‑second launch performance.
Introduction
App launch is the first impression for users; slower launches increase churn. By researching existing optimization methods and applying them to the Maimai iOS app, specific actionable recommendations were derived, ultimately achieving a 900 ms launch time.
Understanding How an App Starts
Launch Process
The launch is divided at main into pre‑main and post‑main phases.
Pre‑main
Loading dyld : Dynamic libraries are loaded, each with its own dependencies, incurring lookup time.
rebase & binding : ASLR randomizes addresses (rebase) and resolves external symbols such as NSLog (binding) via dyld_stub_binder .
Objc setup : Runtime registers classes, categories, and selectors.
load & constructor & initialize : All class +load methods run, followed by C/C++ static initializers and constructor functions.
Post‑main
main function : Creates the autorelease pool, initializes the window, and begins UI display.
LifeCycle : Sets the root view controller and executes business code.
First Frame : The first viewDidAppear renders the initial frame, marking launch completion.
Measuring App Startup Time
Instrumentation
Process creation (using sysctl )
#import <sys/sysctl.h>
#import <mach/mach.h>
+ (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc*)procInfo {
int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
size_t size = sizeof(*procInfo);
return sysctl(cmd, sizeof(cmd)/sizeof(*cmd), procInfo, &size, NULL, 0) == 0;
}
+ (NSTimeInterval)processStartTime {
struct kinfo_proc kProcInfo;
if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) {
return kProcInfo.kp_proc.p_un.__p_starttime.tv_sec * 1000.0 + kProcInfo.kp_proc.p_un.__p_starttime.tv_usec / 1000.0;
} else {
return 0;
}
}+load timing can be captured by naming Pods with an AAA prefix so the first +load executed is known.
dyld statistics : Setting the environment variable DYLD_PRINT_STATISTICS=1 (or DYLD_PRINT_STATISTICS_DETAILS=1 ) in Xcode scheme prints detailed timing for each pre‑main step.
Tools
Time Profiler : Xcode’s sampling profiler (1 ms sampling) aggregates stack traces to estimate method execution time. Example: a memory‑leak detection module caused a noticeable delay.
System Trace : Records thread scheduling, system‑thread transitions, and memory usage, useful for spotting high‑priority thread overloads and blocking events.
Maimai iOS Case Study
When debugging on cellular data, the app launched ~2 s slower than on Wi‑Fi. System Trace revealed a 2.01 s block on the main thread caused by a semaphore wait in a debug‑only request to http://localhost:8081 . Removing that code equalized launch times across networks.
Additional manual instrumentation showed OpenUDID sometimes took 400 ms due to costly UIPasteboard reads, which were later eliminated.
Optimization Plan
Overall Strategy
Delete obsolete startup items.
Delay non‑essential work until after launch or on first use.
Introduce concurrency where safe, while controlling thread count and QoS.
Accelerate frequently used code by caching results.
Pre‑main Optimizations
Dynamic libraries : Reduce the number of custom dylibs; merge them or convert to static libraries using CocoaPods ( pod package xxxx.podspec --force ) and avoid use_frameworks! .
Code size reduction : Analyze objc class and selector references ( __objc_classlist ) to prune unused classes/methods; use runtime‑aware metrics and a custom decoupling analysis tool based on set differences.
Binary reordering : Place startup‑critical symbols at the beginning of the Mach‑O file to reduce PageFaults. Example calculation: three methods spread over three pages cause ~1.5 ms of faults; reordering them onto one page saves ~1 ms.
Hooking & static scanning : Capture all invoked symbols via objc_msgSend hooking or Mach‑O static scans of __TEXT segments.
Clang instrumentation (SanitizerCoverage): Add -fsanitize-coverage=func,trace-pc-guard to C flags and -sanitize-coverage=func to Swift flags, then implement guard callbacks:
void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
static uint64_t N;
if (start == stop || *start) return;
printf("INIT: %p %p\n", start, stop);
for (uint32_t *x = start; x < stop; x++) *x = ++N;
}
void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
void *PC = __builtin_return_address(0);
SymbolNode *node = malloc(sizeof(SymbolNode));
*node = (SymbolNode){PC, NULL};
OSAtomicEnqueue(&symboList, node, offsetof(SymbolNode, next));
}Post‑main Optimizations
Startup controller : Use a flow‑control framework to schedule tasks, limit concurrent threads, and allow lazy loading of non‑critical modules.
Third‑party SDKs : Defer or parallelize heavy SDK initialization.
Frequent methods : Cache results of cheap but repeatedly called APIs such as reading Info.plist keys.
Locks : Replace long‑waiting semaphores with lighter synchronization primitives.
Thread management : Keep the number of high‑QoS threads ≤ CPU core count.
Image handling : Store launch images in an Asset Catalog; pre‑load assets on background threads.
Fishhook : Avoid using it on the main thread; if required, invoke it from a background thread after dyld registration.
First‑frame rendering : Replace heavy GIF loading with static placeholders; defer Lottie animation until after the first frame.
Verification of Improvements
Binary reordering reduced the 90th‑percentile launch time by ~600 ms. Post‑main optimizations added another ~500 ms reduction, bringing native launch time down from ~600 ms to ~270 ms (≈300 ms saved) and achieving a sub‑second (<900 ms) launch.
Feed‑to‑RN first‑frame latency also dropped ~200 ms, as confirmed by Xcode Organizer metrics.
Additional Unconventional Techniques
+load Migration
Move heavy +load work to compile‑time registration using a custom Mach‑O section:
typedef struct { const char *cls; const char *protocol; } _mm_pair;
#if DEBUG
#define MM_SERVICE(PROTOCOL_NAME,CLASS_NAME) \
__used static Class<PROTOCOL_NAME> _MM_VALID_METHOD(void){\
return [CLASS_NAME class];\
} \
__attribute((used, section(_MM_SEGMENT, _MM_SECTION))) static _mm_pair _MM_UNIQUE_VAR = {\
_TO_STRING(CLASS_NAME),\
_TO_STRING(PROTOCOL_NAME),\
};
#else
__attribute((used, section(_MM_SEGMENT, _MM_SECTION))) static _mm_pair _MM_UNIQUE_VAR = {\
_TO_STRING(CLASS_NAME),\
_TO_STRING(PROTOCOL_NAME),\
};
#endif__TEXT Segment Renaming
Encrypting the __TEXT segment causes decryption PageFaults; moving code to another segment via ld -rename_section can avoid this, though it introduces linking complications and breaks dSYM symbolication.
PGO (Profile‑Guided Optimization)
Apple’s PGO can be invoked via Xcode’s “Generate Optimization Profile” action, but it is unsuitable for Swift projects and requires frequent regeneration after code changes.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.