Mobile Development 28 min read

Mastering iOS App Startup: Real‑World Monitoring and Optimization Techniques

This article walks through practical iOS startup optimization, covering monitoring strategies, non‑intrusive instrumentation, development‑stage testing, essential tools like Time Profiler and System Trace, and a comprehensive set of best‑practice tactics to shrink launch time and maintain performance.

ByteDance SE Lab
ByteDance SE Lab
ByteDance SE Lab
Mastering iOS App Startup: Real‑World Monitoring and Optimization Techniques

Introduction

App launch is the first impression for users; a slow start increases churn. Because startup optimization touches many topics, the author splits the discussion into theory and practice, focusing here on the practical side.

How to Do Startup Optimization?

Before diving in, consider four questions: what is the current launch performance of online users, where can we find optimization points, how to keep improvements, and what mature industry practices exist. The article structures the answer into three modules: monitoring, tools, and best practices.

Monitoring

Startup Metrics

The start point is the process creation time. The end point is the first frame after the launch image disappears. For iOS 12 and earlier, the end point is rootViewController viewDidAppear; for iOS 13+, it is applicationDidBecomeActive. Apple’s official metric is CA::Transaction::commit, but Douyin’s approach is close enough.

Phased Monitoring

One point is insufficient; Douyin combines single‑point and phased monitoring. The phases include process creation, earliest +load, didFinishLaunching, and first‑frame rendering.

Non‑Intrusive Monitoring

The APM team provides a non‑intrusive solution that splits the launch into coarse, business‑independent stages: process creation, earliest +load, didFinishLaunching, and first‑frame render. The first three stages are easy to capture:

Process creation: use sysctl system call.

Earliest +load: give a Pod a name starting with AAA so its +load runs first. didFinishLaunching: record the SDK initialization time.

For the first‑frame render, align with MetricKit and capture the call time of CA::Transaction::commit(). By registering a RunLoop block or a kCFRunLoopBeforeTimers observer, the two timestamps can be obtained (code shown below).

CFRunLoopRef mainRunloop = [[NSRunLoop mainRunLoop] getCFRunLoop];
CFRunLoopPerformBlock(mainRunloop, NSDefaultRunLoopMode, ^{ NSTimeInterval stamp = [[NSDate date] timeIntervalSince1970]; NSLog(@"runloop block launch end:%f", stamp); });
CFRunLoopObserverRef observer = CFRunLoopObserverCreateWithHandler(kCFAllocatorDefault, kCFRunLoopAllActivities, true, 0, ^(CFRunLoopObserverRef observer, CFRunLoopActivity activity){ if (activity == kCFRunLoopBeforeTimers){ NSTimeInterval stamp = [[NSDate date] timeIntervalSince1970]; NSLog(@"runloop beforetimers launch end:%f", stamp); CFRunLoopRemoveObserver(mainRunloop, observer, kCFRunLoopCommonModes); } });
CFRunLoopAddObserver(mainRunloop, observer, kCFRunLoopCommonModes);

Final choice:

iOS 13+ uses a RunLoop kCFRunLoopBeforeTimers callback.

iOS 12 and earlier use CFRunLoopPerformBlock.

Monitoring Periods

Development Stage

Automated offline monitoring runs a release‑mode build, executes a launch test, and reports results to a dashboard. If degradation is detected, an alert is sent, the offending MR is bisected, and flame graphs or Instruments are generated for root‑cause analysis. Stability of test results is ensured by controlling variables (disable iCloud, use airplane mode, cool the device, restart between runs, average multiple measurements, mock AB variables, etc.).

Gray / Online Stage

Production monitoring relies on aggregated metrics and alerts. Launch time can be viewed in Xcode’s Organizer (Launch Time). Statistical patterns include slower launches right after a new version (first launch creates a launch closure), slower pct50 on older versions (slow devices upgrade later), and sampling‑rate effects on device distribution.

Tools

Time Profiler

Samples the call stack every 1 ms; only shows stack frames that were sampled, not actual execution time. Increase sampling frequency, record kernel call stacks, or record waiting threads for more accurate data.

System Trace

Provides fine‑grained analysis of virtual memory, thread states, and system load. Use Point‑of‑Interest to mark a short interval for detailed inspection.

os_signpost

iOS 12+ API for high‑performance instrumentation with negligible impact. Combine with method swizzling to mark custom phases (e.g., all +load calls, image loading).

Other Instruments

Static Initializer – analyze C++ static init.

App Launch – Xcode 11+ template (Time Profiler + System Trace).

Custom Instrument – use os_signpost as data source.

Flame Graph

Visualize time‑related bottlenecks by instrumenting objc_msgSend or compile‑time insertion. Convert data to Chrome’s JSON format for analysis.

Best Practices

Overall Idea

Four steps: remove launch items, defer work, parallelize, and speed up remaining code.

Before main

Load dyld.

Create launch closure (required after app update or device reboot).

Load dynamic libraries.

Bind, rebase, runtime init.

Execute +load and static initializers.

Dynamic Libraries

Keep the number of dynamic libs < 6. Prefer static linking or merging libs; avoid linking unused system libs.

Dead Code Removal

Static scanning (AppCode, Mach‑O _objc_selrefs, _objc_classrefs, __objc_classlist) and online usage statistics (view‑controller penetration, class penetration, line‑level penetration) help identify unused code.

+load Migration

Replace +load registration with compile‑time attributes; example macro stores class‑protocol pairs in a custom section and reads them at runtime.

typedef struct{ const char *cls; const char *protocol; } _di_pair;
#if DEBUG
#define DI_SERVICE(PROTOCOL_NAME,CLASS_NAME) __used static Class<PROTOCOL_NAME> _DI_VALID_METHOD(void){ return [CLASS_NAME class]; } __attribute((used,section(_DI_SEGMENT,"_DI_SECTION"))) static _di_pair _DI_UNIQUE_VAR = { #CLASS_NAME, #PROTOCOL_NAME };
#else
__attribute((used,section(_DI_SEGMENT,"_DI_SECTION"))) static _di_pair _DI_UNIQUE_VAR = { #CLASS_NAME, #PROTOCOL_NAME };
#endif

Static Init Migration

Move static data out of global scope; lazily initialize inside functions. Example shows moving a global std::string array into a function‑local static.

// Bad
namespace { static const std::string bucket[] = {"apples","pears","meerkats"}; }
const std::string GetBucketThing(int i){ return bucket[i]; }
// Good
const std::string GetBucketThing(int i){ static const std::string bucket[] = {"apples","pears","meerkats"}; return bucket[i]; }

Startup Tasks (BootTask)

A lightweight central scheduler stores task order and thread. Tasks implement a BootTask protocol. The scheduler provides global concurrency, delayed execution, fine‑grained monitoring, and code‑review control.

Third‑Party SDKs

Remove heavy SDKs (e.g., Fabric) or defer them (share, login). Evaluate impact before integration; paid SDKs are usually willing to cooperate.

High‑Frequency Methods

Cache results of frequently called lightweight methods (e.g., reading a key from Info.plist) to avoid repeated I/O.

+ (NSString *)plistChannel { return [[NSBundle mainBundle] infoDictionary][@"CHANNEL_NAME"]; }

Locks

Avoid holding global locks on background threads that block the main thread (e.g., UIImage imageNamed triggers dlopen which holds dyld’s global mutex).

Thread Count & QoS

Use appropriate QoS (User Interactive/Initiated) for tasks the main thread must wait for. Keep high‑priority thread count ≤ CPU core count; analyze context‑switch cost with System Trace.

Images

Prefer Asset Catalogs over raw bundle images; preload critical images on a background thread; avoid GIFs for loading animations.

Fishhook

Hooking C functions incurs heavy Page In. If unavoidable, call fishhook on a background thread and avoid invoking it from _dyld_register_func_for_add_image which holds dyld’s global lock.

First‑Frame Rendering

Show a static frame for Lottie animations, then start the animation after launch.

Avoid creating hidden views; lazy‑load instead.

Consider replacing heavy AutoLayout with frames where ROI is high.

Prefer sprite‑sheet or video over GIF for loading animations.

Other Tips

Never delete tmp/com.apple.dyld – it stores the launch closure on iOS 13+.

Use mmap for faster I/O.

On iPhone 6, consider disabling WebView User Agent, Keychain, or VolumeView to reduce launch cost.

Page‑In Cost

Section Renaming

App Store encrypts the __TEXT segment; moving data to other sections avoids decryption. Douyin’s ld flags rename several __TEXT subsections to __RODATA.

-Wl,-rename_section,__TEXT,__cstring,__RODATA,__cstring
-Wl,-rename_section,__TEXT,__const,__RODATA,__const
-Wl,-rename_section,__TEXT,__gcc_except_tab,__RODATA,__gcc_except_tab
-Wl,-rename_section,__TEXT,__objc_methname,__RODATA,__objc_methname
-Wl,-rename_section,__TEXT,__objc_classname,__RODATA,__objc_classname
-Wl,-rename_section,__TEXT,__objc_methtype,__RODATA,__objc_methtype

Binary Reordering

Place launch‑used symbols together using ld’s -order_file. Two approaches: Douyin’s static scan of +load and C++ static init plus objc_msgSend hooks; Facebook’s LLVM instrumentation that collects runtime symbols during a gray‑run and generates an optimal order file.

Unconventional Solutions

Lazy‑Load Dynamic Libraries

Package rarely‑used code in dynamic frameworks that are not linked at build time. Load them on demand with [NSBundle load] (internally dlopen). Separate business UI libs (loaded via routing) from functional libs (wrapped with a thin loader that performs dlsym calls).

Background Fetch

Periodically launch the app in background to refresh data, turning a cold start into a hot start for the next user launch. Requires careful handling of AB tests, ads, and delayed tasks (e.g., move didFinishLaunching work to the first foreground transition).

Conclusion

White‑box optimization: know where the slowdown originates.

Online data is the only reliable compass; run A/B experiments to validate impact.

Build defenses against regression; rapid iteration can outpace optimization without safeguards.

Invest in long‑term architecture; a solid foundation sustains launch performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InstrumentationiOSstartup-optimizationPerformance Monitoring
ByteDance SE Lab
Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.