How to Slash iOS App Startup Time: Strategies from the Watermelon App
This article details the Watermelon iOS app's comprehensive startup optimization, covering launch phase definitions, metric design, task scheduling, code instrumentation, thread management, rendering and network improvements, as well as monitoring and anti‑regression measures to consistently reduce launch latency.
Background
Startup is the first impression of a product; long launch times erode user patience. Reducing launch time can effectively lower 0vv (zero playback after launch), making launch latency a core quality metric for the Watermelon client.
Launch Definition
According to the 2019 WWDC video, Apple divides launch into six stages (see image).
The work done in each stage is:
System Interface: dyld loads shared libraries and frameworks, initializes low‑level system components.
Runtime Init: initializes language runtime and calls static initializers of all classes.
UIKit Init: initializes UIApplication and its delegate, starts event handling.
Application Init: calls application:willFinishLaunchingWithOptions: and application:didFinishLaunchingWithOptions:, then applicationDidBecomeActive:.
Initial Frame Render: creates, lays out and draws views, then renders the first screen.
Extended: app enters foreground and can interact with the user.
Metric Definition
Before optimization, it is essential to define metrics. Since the app also requests and renders home‑page data after the first screen, focusing only on pre‑first‑screen data is insufficient. Watermelon therefore extends the "Extended" stage with custom event timestamps.
Initially the launch metric was defined as the time from the first +load to list rendering completion (see image).
To capture the start point ( +load) accurately, a dummy dynamic library named with the prefix "AAA" is added; its +load records a timestamp.
For the end point (list rendering), CATransaction could be used, but it may wait for endless animations. The final solution hooks layoutSubviews and uses dispatch_async to signal completion.
[CATransaction begin];
[CATransaction setCompletionBlock:^{
// list render finished
}];
[self.tableView reloadData];
[CATransaction commit]; [self.tableView xig_reloadDataWithCompletion:^{
// list render finished
}];Later testing showed the metric did not reflect the perceived user experience, so the start point was changed to app click and the end point to list display.
Process start time can be obtained via kinfo_proc:
+ (NSTimeInterval)time {
struct kinfo_proc kProcInfo;
NSTimeInterval processStartTime = 0;
if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) {
processStartTime = kProcInfo.kp_proc.p_un.__p_starttime.tv_sec + kProcInfo.kp_proc.p_un.__p_starttime.tv_usec / 1000000.0;
processStartTime *= 1000;
}
return processStartTime;
}
+ (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc *)procInfo {
int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
size_t size = sizeof(*procInfo);
return sysctl(cmd, sizeof(cmd) / sizeof(*cmd), procInfo, &size, NULL, 0) == 0;
}For the end point, the app hides a loading animation view after list rendering. The hide moment is captured by observing the run‑loop's beforeWaiting phase and dispatching asynchronously.
- (void)reload {
[self.tableView xig_reloadDataWithCompletion:^{
// trigger loading animation hide
[self.tableView xig_endUpdateData:NO];
[self observeNextRenderWithBlock:^{
// loading animation hidden
}];
}];
}
- (void)observeNextRenderWithBlock:(dispatch_block_t)block {
CFRunLoopRef mainRunloop = [[NSRunLoop mainRunLoop] getCFRunLoop];
CFRunLoopActivity activities = kCFRunLoopBeforeWaiting | kCFRunLoopExit;
CFRunLoopObserverRef observer = CFRunLoopObserverCreateWithHandler(kCFAllocatorDefault, activities, true, 0, ^(CFRunLoopObserverRef observer, CFRunLoopActivity activity) {
CFRunLoopRemoveObserver(mainRunloop, observer, kCFRunLoopCommonModes);
CFRelease(observer);
if (block) {
dispatch_async(dispatch_get_main_queue(), block);
}
});
CFRunLoopAddObserver(mainRunloop, observer, kCFRunLoopCommonModes);
}Beyond overall latency, the team also tracks stage‑level times such as list creation, request, and render, ultimately aiming to reduce the time from app click to the first video frame.
Launch Architecture
The legacy Watermelon iOS launcher suffered from unclear dependencies, chaotic logic, and crashes when tasks were adjusted. A refactor introduced a more stable, efficient startup framework.
Phased Launch
The launch is divided into four phases for easier reasoning: didFinishLaunch , launchCompletion (first screen finished), homeDidRendered (home page rendered), and AfterPlayerFirstFrame .
Core components are initialized in didFinishLaunch (APM SDK, network, analytics). Business components go into launchCompletion. Non‑critical or post‑first‑screen tasks are placed in homeDidRendered and AfterPlayerFirstFrame.
Launcher Design
The launcher supports three queues: main, idle, and concurrent. In each phase a DAG of tasks is built; topological sorting determines execution order. An adjacency‑list representation detects cycles, asserting when found.
StartupManagerregisters tasks that conform to StartupTask. Tasks are registered in application:willFinishLaunchingWithOptions: and declare dependencies and the queue they run on.
Task Registration
Watermelon uses ByteDance's Gaia component. Functions marked with __attribute__((section())) are placed in a custom Mach‑O section; at runtime _dyld_register_func_for_add_image invokes them.
// before expansion
XIGRegisterStartUpTaskFunction() {
// register XIGDemoTask
[XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}
// after expansion (simplified)
__attribute__((used)) static void __GAIA_ID__0(void);
static const GAIAFunctionInfo __GAIA_F_I_ID__0 = { (void *)__GAIA_ID__0, __FILE_NAME__, __LINE__ };
__attribute__((used, no_sanitize_address, section("__DATA,__GAIA__SECTION"))) static const GAIAData __GAIA_ID__1 = { GAIATypeFunctionInfo, false, "XIGRegisterStartUpTask", &__GAIA_F_I_ID__0 };
static void __GAIA_ID__0() {
[XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}All tasks are started via:
[GAIAEngine startTasksForKey:@XIGRegisterStartUpTaskGaiaKey];Example task implementation:
@implementation XIGDemoTask
XIGRegisterStartUpTaskFunction() {
[XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}
- (void)execute {
// execute task
}
@endProblem Statement
Problem
Affected Phase
Priority
大量+load
Runtime Init
p1
大量静态初始化
Runtime Init
p2
主线程阻塞、耗时操作
Runtime Init, Application Init
p0
首屏渲染耗时长
Initial Frame Render
p0
首刷请求晚
首刷
p0
列表创建晚
首刷
p1
列表渲染耗时
首刷渲染
p0
大量网络请求抢占首刷资源
首刷
p0
大量后台线程抢占CPU资源
首刷
p2
Optimization Ideas
Time‑Consuming Task Governance
+load and Static Initialization
Although not inherently slow, they cause virtual‑memory page faults. In a trace, Runtime Init took 736 ms, with 579 ms due to page faults. The solution replaces +load and static constructors with startup tasks, except for third‑party libraries.
// before
+ (void)load {}
__attribute__((constructor)) void demoFunc() {}
// after
XIGRegisterStartUpTaskFunction() {
[XIGStartUp registerTaskInStage:XIGStartUpDidFinishLaunch usingBlock:^{ }];
}Main‑Thread Blocking
Three main causes: virtual‑memory page faults (ignored), inter‑process communication, and lock contention. For IPC (e.g., keychain, IDFA), cache data in background threads or persist to local cache. For locks, avoid holding locks on the main thread, refactor logic, or dispatch work to serial queues.
Expensive Operations
Classify tasks as mandatory or non‑mandatory. Non‑mandatory tasks (e.g., WKWebView preload) can be delayed, split, run on background threads, or lazily loaded. Mandatory tasks can be sped up, rescheduled, or made thread‑safe.
Rendering Optimization
The home page renders two collection view sections (followed channel and recommendation). Because UICollectionView cannot skip the first section, both data streams are fetched and rendered, adding latency. The fix checks the target channel during cellForItemAtIndexPath and skips unnecessary renders.
Network Optimization
Beyond protocol improvements (QUIC, request pre‑connect, field reduction), client‑side strategies include network scheduling, early first‑screen request, and image/video pre‑loading.
Network Scheduler
A scheduler delays non‑critical requests until after the first‑screen request completes, reducing contention.
Early First‑Screen Request
Send the first‑screen request immediately after network library initialization, parsing data on a serial queue, so the list can render as soon as data arrives.
Image & Video Pre‑loading
After parsing first‑screen data, the app pre‑loads images and the first video while waiting for view rendering.
Degradation Prevention & Monitoring
Offline Degradation Guard
ByTest runs performance guard tests twice daily, builds a guard package, and uploads launch telemetry to Slardar. An algorithm filters outliers and notifies stakeholders of regressions.
Analysts can drill down using ByTrace or Instruments for deeper investigation.
Online Metric Monitoring
Watermelon dashboards in Tea track multiple launch dimensions, ensuring that feature‑gated changes do not degrade performance.
Binary Reordering Automation
Orderfile generation is now fully automated: a daily check triggers a cloud build, ByTest runs launch tests, generates a new orderfile, and opens a merge request.
Conclusion
Watermelon’s rapid iteration introduces many launch‑related changes weekly; therefore, alongside optimization, robust degradation guards and a solid architecture are essential to maintain and further improve launch performance, including future work on scenario‑based launches and first‑screen rendering.
ByteDance SE Lab
Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
