Mobile Development 24 min read

How to Slash iOS App Startup Time: Strategies from the Watermelon App

This article details the Watermelon iOS app's comprehensive startup optimization, covering launch phase definitions, metric design, task scheduling, code instrumentation, thread management, rendering and network improvements, as well as monitoring and anti‑regression measures to consistently reduce launch latency.

ByteDance SE Lab

Dec 31, 2021

How to Slash iOS App Startup Time: Strategies from the Watermelon App

Background

Startup is the first impression of a product; long launch times erode user patience. Reducing launch time can effectively lower 0vv (zero playback after launch), making launch latency a core quality metric for the Watermelon client.

Launch Definition

According to the 2019 WWDC video, Apple divides launch into six stages (see image).

The work done in each stage is:

System Interface: dyld loads shared libraries and frameworks, initializes low‑level system components.

Runtime Init: initializes language runtime and calls static initializers of all classes.

UIKit Init: initializes UIApplication and its delegate, starts event handling.

Application Init: calls application:willFinishLaunchingWithOptions: and application:didFinishLaunchingWithOptions:, then applicationDidBecomeActive:.

Initial Frame Render: creates, lays out and draws views, then renders the first screen.

Extended: app enters foreground and can interact with the user.

Metric Definition

Before optimization, it is essential to define metrics. Since the app also requests and renders home‑page data after the first screen, focusing only on pre‑first‑screen data is insufficient. Watermelon therefore extends the "Extended" stage with custom event timestamps.

Initially the launch metric was defined as the time from the first +load to list rendering completion (see image).

To capture the start point ( +load) accurately, a dummy dynamic library named with the prefix "AAA" is added; its +load records a timestamp.

For the end point (list rendering), CATransaction could be used, but it may wait for endless animations. The final solution hooks layoutSubviews and uses dispatch_async to signal completion.

[CATransaction begin];
[CATransaction setCompletionBlock:^{
    // list render finished
}];
[self.tableView reloadData];
[CATransaction commit];

[self.tableView xig_reloadDataWithCompletion:^{
    // list render finished
}];

Later testing showed the metric did not reflect the perceived user experience, so the start point was changed to app click and the end point to list display.

Process start time can be obtained via kinfo_proc:

+ (NSTimeInterval)time {
    struct kinfo_proc kProcInfo;
    NSTimeInterval processStartTime = 0;
    if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) {
        processStartTime = kProcInfo.kp_proc.p_un.__p_starttime.tv_sec + kProcInfo.kp_proc.p_un.__p_starttime.tv_usec / 1000000.0;
        processStartTime *= 1000;
    }
    return processStartTime;
}

+ (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc *)procInfo {
    int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
    size_t size = sizeof(*procInfo);
    return sysctl(cmd, sizeof(cmd) / sizeof(*cmd), procInfo, &size, NULL, 0) == 0;
}

For the end point, the app hides a loading animation view after list rendering. The hide moment is captured by observing the run‑loop's beforeWaiting phase and dispatching asynchronously.

- (void)reload {
    [self.tableView xig_reloadDataWithCompletion:^{
        // trigger loading animation hide
        [self.tableView xig_endUpdateData:NO];
        [self observeNextRenderWithBlock:^{
            // loading animation hidden
        }];
    }];
}

- (void)observeNextRenderWithBlock:(dispatch_block_t)block {
    CFRunLoopRef mainRunloop = [[NSRunLoop mainRunLoop] getCFRunLoop];
    CFRunLoopActivity activities = kCFRunLoopBeforeWaiting | kCFRunLoopExit;
    CFRunLoopObserverRef observer = CFRunLoopObserverCreateWithHandler(kCFAllocatorDefault, activities, true, 0, ^(CFRunLoopObserverRef observer, CFRunLoopActivity activity) {
        CFRunLoopRemoveObserver(mainRunloop, observer, kCFRunLoopCommonModes);
        CFRelease(observer);
        if (block) {
            dispatch_async(dispatch_get_main_queue(), block);
        }
    });
    CFRunLoopAddObserver(mainRunloop, observer, kCFRunLoopCommonModes);
}

Beyond overall latency, the team also tracks stage‑level times such as list creation, request, and render, ultimately aiming to reduce the time from app click to the first video frame.

Launch Architecture

The legacy Watermelon iOS launcher suffered from unclear dependencies, chaotic logic, and crashes when tasks were adjusted. A refactor introduced a more stable, efficient startup framework.

Phased Launch

The launch is divided into four phases for easier reasoning: didFinishLaunch , launchCompletion (first screen finished), homeDidRendered (home page rendered), and AfterPlayerFirstFrame .

Core components are initialized in didFinishLaunch (APM SDK, network, analytics). Business components go into launchCompletion. Non‑critical or post‑first‑screen tasks are placed in homeDidRendered and AfterPlayerFirstFrame.

Launcher Design

The launcher supports three queues: main, idle, and concurrent. In each phase a DAG of tasks is built; topological sorting determines execution order. An adjacency‑list representation detects cycles, asserting when found.

StartupManager

registers tasks that conform to StartupTask. Tasks are registered in application:willFinishLaunchingWithOptions: and declare dependencies and the queue they run on.

Task Registration

Watermelon uses ByteDance's Gaia component. Functions marked with __attribute__((section())) are placed in a custom Mach‑O section; at runtime _dyld_register_func_for_add_image invokes them.

// before expansion
XIGRegisterStartUpTaskFunction() {
    // register XIGDemoTask
    [XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}

// after expansion (simplified)
__attribute__((used)) static void __GAIA_ID__0(void);
static const GAIAFunctionInfo __GAIA_F_I_ID__0 = { (void *)__GAIA_ID__0, __FILE_NAME__, __LINE__ };
__attribute__((used, no_sanitize_address, section("__DATA,__GAIA__SECTION"))) static const GAIAData __GAIA_ID__1 = { GAIATypeFunctionInfo, false, "XIGRegisterStartUpTask", &__GAIA_F_I_ID__0 };
static void __GAIA_ID__0() {
    [XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}

All tasks are started via:

[GAIAEngine startTasksForKey:@XIGRegisterStartUpTaskGaiaKey];

Example task implementation:

@implementation XIGDemoTask

XIGRegisterStartUpTaskFunction() {
    [XIGStartUp registerTaskClass:XIGDemoTask.class inStage:XIGStartUpLaunchCompletion];
}

- (void)execute {
    // execute task
}
@end

Problem Statement

Problem

Affected Phase

Priority

大量+load

Runtime Init

大量静态初始化

Runtime Init

主线程阻塞、耗时操作

Runtime Init, Application Init

首屏渲染耗时长

Initial Frame Render

首刷请求晚

首刷

列表创建晚

首刷

列表渲染耗时

首刷渲染

大量网络请求抢占首刷资源

首刷

大量后台线程抢占CPU资源

首刷

Optimization Ideas

Time‑Consuming Task Governance

+load and Static Initialization

Although not inherently slow, they cause virtual‑memory page faults. In a trace, Runtime Init took 736 ms, with 579 ms due to page faults. The solution replaces +load and static constructors with startup tasks, except for third‑party libraries.

// before
+ (void)load {}
__attribute__((constructor)) void demoFunc() {}

// after
XIGRegisterStartUpTaskFunction() {
    [XIGStartUp registerTaskInStage:XIGStartUpDidFinishLaunch usingBlock:^{ }];
}

Main‑Thread Blocking

Three main causes: virtual‑memory page faults (ignored), inter‑process communication, and lock contention. For IPC (e.g., keychain, IDFA), cache data in background threads or persist to local cache. For locks, avoid holding locks on the main thread, refactor logic, or dispatch work to serial queues.

Expensive Operations

Classify tasks as mandatory or non‑mandatory. Non‑mandatory tasks (e.g., WKWebView preload) can be delayed, split, run on background threads, or lazily loaded. Mandatory tasks can be sped up, rescheduled, or made thread‑safe.

Rendering Optimization

The home page renders two collection view sections (followed channel and recommendation). Because UICollectionView cannot skip the first section, both data streams are fetched and rendered, adding latency. The fix checks the target channel during cellForItemAtIndexPath and skips unnecessary renders.

Network Optimization

Beyond protocol improvements (QUIC, request pre‑connect, field reduction), client‑side strategies include network scheduling, early first‑screen request, and image/video pre‑loading.

Network Scheduler

A scheduler delays non‑critical requests until after the first‑screen request completes, reducing contention.

Early First‑Screen Request

Send the first‑screen request immediately after network library initialization, parsing data on a serial queue, so the list can render as soon as data arrives.

Image & Video Pre‑loading

After parsing first‑screen data, the app pre‑loads images and the first video while waiting for view rendering.

Degradation Prevention & Monitoring

Offline Degradation Guard

ByTest runs performance guard tests twice daily, builds a guard package, and uploads launch telemetry to Slardar. An algorithm filters outliers and notifies stakeholders of regressions.

Analysts can drill down using ByTrace or Instruments for deeper investigation.

Online Metric Monitoring

Watermelon dashboards in Tea track multiple launch dimensions, ensuring that feature‑gated changes do not degrade performance.

Binary Reordering Automation

Orderfile generation is now fully automated: a daily check triggers a cloud build, ByTest runs launch tests, generates a new orderfile, and opens a merge request.

Conclusion

Watermelon’s rapid iteration introduces many launch‑related changes weekly; therefore, alongside optimization, robust degradation guards and a solid architecture are essential to maintain and further improve launch performance, including future work on scenario‑based launches and first‑screen rendering.

iOS startup-optimization task scheduling network optimization

Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.