Mobile Development 23 min read

How Meituan Optimized iOS Cold Start: Strategies, Tools, and Code

This article details Meituan's systematic approach to reducing iOS app cold‑start time, covering the definition of startup phases, identification of performance bottlenecks, phased launch, self‑registration of startup tasks, code slimming, profiling tools, and continuous monitoring to achieve smoother user experience.

21CTO
21CTO
21CTO
How Meituan Optimized iOS Cold Start: Strategies, Tools, and Code

Background

Cold‑start time is a key performance metric for apps, acting as the first gate to user experience. The Meituan Waimai iOS client has evolved through dozens of versions, adding new business lines such as flash‑sale and errands, which increased the amount of work that must be completed during cold start and created performance challenges.

Cold Start Definition

Generally, iOS cold start is defined from the moment a user taps the app icon until the application:didFinishLaunchingWithOptions: method finishes. This process is divided into three stages:

T1: before main() – the OS loads the executable and performs linking.

T2: from main() to the end of didFinishLaunchingWithOptions.

T3: after didFinishLaunchingWithOptions until the main UI is visible.

The complete cold‑start process is therefore T1 + T2 + T3.

Current Issues

Performance debt

After many iterations, the app accumulated numerous startup items (SDK initializations, pre‑loads, etc.) that became bottlenecks.

Performance growth

Cold‑start time grows gradually as new features add more startup items; each version can add roughly 0.1 s, eventually becoming a serious problem.

Governance Strategy

Resolve existing bottlenecks by optimizing the startup flow.

Control incremental growth through a standardized, documented startup process.

Improve monitoring to detect performance regressions early.

Standardized Startup Process

The platformized architecture introduced a large number of startup items, leading to two main problems: severe accumulation of items that slowed launch, and lack of a clear addition pattern causing maintenance risk.

Phased Launch

Startup items are classified by priority and assigned to distinct phases. Critical items (crash monitoring, statistics) run earliest; medium‑priority items (location, network) run in early phases; low‑priority items (payment SDK, map SDK) are delayed.

About 30 % of startup items were identified as delayable, reducing the cold‑start portion they occupy.

Self‑Registration

After defining phases, the team needed a way to execute items without a central list. Using the __DATA section and a macro, each startup function registers itself with a key representing its phase. At runtime, the manager reads the section and invokes the functions for the current phase.

__attribute__((used, section("__DATA","__kylin__"))) static const KLN_DATA __kylin__0 = (KLN_DATA){(KLN_DATA_HEADER){"Key", KLN_STRING, KLN_IS_ARRAY}, "Value"};

Example registration:

KLN_FUNCTIONS_EXPORT(STAGE_KEY_A)() {
    // startup code A
}

KLN_FUNCTIONS_EXPORT(STAGE_KEY_A)() {
    // startup code B
}

Execution at the appropriate moment:

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // other logic
    [[KLNKylin sharedInstance] executeArrayForKey:STAGE_KEY_A]; // trigger stage A
    // other logic
    return YES;
}

Optimizing Before main()

Before main(), the OS loads the Mach‑O executable, loads dyld, and performs dynamic linking. Factors that increase T1 time include:

More dynamic libraries.

More Objective‑C classes and methods.

More +load methods.

More C constructors.

More C++ static objects.

Code Slimming

Unused selectors are identified by comparing __TEXT:__objc_methname (all methods) with __DATA__objc_selrefs (referenced methods). The following script extracts the unused selectors:

def referenced_selectors(path):
    re_sel = re.compile("__TEXT:__objc_methname:(.+)")
    refs = set()
    lines = os.popen("/usr/bin/otool -v -s __DATA __objc_selrefs %s" % path).readlines()
    for line in lines:
        results = re_sel.findall(line)
        if results:
            refs.add(results[0])
    return refs

+load Optimization

Heavy +load methods are replaced by staged startup using the Kylin framework. Example:

// Replace +load with a custom macro
WMAPP_BUSINESS_INIT_AFTER_HOMELOADING() {
    // original +load code
}

Profiling Tools

Time Profiler (Xcode) and flame graphs (Caesium) are used to locate hotspots. The analysis identified several bottlenecks and saved more than 0.3 s of cold‑start time.

Parallelizing Serial Work

Using a splash screen as the root view controller allows the UI to be built while the splash screen is displayed. Caching location and pre‑fetching home data makes the location‑request‑render pipeline parallel, reducing home‑page load time by about 40 %.

Data Monitoring

The internal Metrics system records cold‑start timestamps from process creation (the exec call) to the moment the main UI appears. Start time is obtained via sysctl on the process info.

#import <sys/sysctl.h>
#import <mach/mach.h>

+ (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc *)procInfo {
    int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
    size_t size = sizeof(*procInfo);
    return sysctl(cmd, sizeof(cmd)/sizeof(*cmd), procInfo, &size, NULL, 0) == 0;
}

+ (NSTimeInterval)processStartTime {
    struct kinfo_proc kProcInfo;
    if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) {
        return kProcInfo.kp_proc.p_un.__p_starttime.tv_sec * 1000.0 + kProcInfo.kp_proc.p_un.__p_starttime.tv_usec / 1000.0;
    } else {
        NSAssert(NO, @"Unable to obtain process info");
        return 0;
    }
}

Experiments show that process creation precedes the earliest +load by 12 ms on a blank app and by 688 ms on production devices, providing a more accurate start‑point for metrics.

Metrics aggregates percentile data (50th, 90th, 95th) for cold‑start duration, giving a macro view of performance distribution.

Conclusion

For fast‑iterating apps, cold‑start time inevitably grows with business complexity. By systematically analyzing each phase, applying phased launch, self‑registration, code slimming, profiling, and continuous online monitoring, Meituan achieved a smoother startup experience and established a process to control future performance regressions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationiOSmetricsProfilingcold startstartupKylin
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.