Optimizing Startup Performance of NetEase Cloud Music iOS App
To cut the NetEase Cloud Music iOS app’s sluggish launch, the team trimmed dynamic libraries, replaced costly +load registrations with static modules, swapped SBJson for native parsing, reordered binaries, lazy‑loaded heavy services and ads, and streamlined UI initialization, delivering over 30 % faster cold‑starts and urging ongoing performance monitoring.
Background : The NetEase Cloud Music iOS app, with nearly ten years of development, suffers from slow launch times due to the continuous addition of business code to the launch chain. Users have reported poor startup speed, which can affect retention, prompting a dedicated startup‑performance optimization effort.
Analysis – Definition of Startup : After iOS 13 Apple replaced dyld2 with dyld3 and introduced the concept of a launch closure. The definition of cold and hot start differs before and after iOS 13. The article adopts the iOS 13+ cold‑start definition (process creation after a device reboot, no cached process information).
Cold‑Start Definition : Measured from the moment the user taps the app icon until the launch screen disappears and the first frame is rendered. It consists of two stages:
T1 – Pre‑main: system creates the process, loads the Mach‑O, creates the launch closure, and dyld performs loading, rebasing, binding, Obj‑C init, and +load execution.
T2 – Post‑main: execution of UI creation, delegate lifecycle, and first‑frame rendering.
The analysis uses an iPhone 8 Plus (iOS 14.3) in Debug mode as the benchmark device.
Current Situation : The app contains 16 dynamic libraries, many +load methods (≈800 calls, >550 ms total), and a complex startup chain that also includes a fake red‑screen caused by the ad module.
1. T1‑Stage Governance
Dynamic Library Management : Apple recommends ≤6 dynamic libraries. Strategies include converting to static libraries, merging libraries, or lazy‑loading. The team chose to convert most libraries to static, resolve OpenSSL symbol conflicts, and remove unused libraries, achieving ~200 ms gain.
+load Method Governance : +load runs very early on the main thread, adds latency, and can cause crashes if it fails. The team identified the most expensive +load methods (≥2 ms) and refactored them to use a centralized registration API via a custom __DATA section. Example registration macro:
#define _MODULE_DATA_SECT(sectname) __attribute((used, section("__DATA," sectname) ))
#define _ModuleEntrySectionName "_ModuleSection"
typedef struct { const char *className; } _ModuleRegisterEntry;
#define __ModuleRegisterInternal(className) \
static _ModuleRegisterEntry _Module##className##Entry _MODULE_DATA_SECT(_ModuleEntrySectionName) = { #className };They also deprecated the old macro‑based +load registration:
static inline __attribute__((deprecated("NEModuleHubExport is deprecated, please use 'ModuleRegister'"))) void func_loadDeprecated(void) {}
#define NEModuleHubExport \
+(void)load { \
// original registration \
func_loadDeprecated(); \
}Static Initializer Analysis : Identified C/C++ constructors, static global variables, and runtime‑initialized globals that run after +load. The team used Mach‑O sections (__DATA,__mod_init_func) to hook and measure their cost, but found limited impact.
Page‑In Impact : Excessive page‑ins cause I/O and decryption overhead. Instruments System Trace was used to measure File‑Backed Page‑In; the impact was not a bottleneck on iOS 13+.
2. T2‑Stage Governance
High‑Frequency Objective‑C Methods : Flame graphs revealed hot spots such as [[NEHeimdall]disableOptions] called from NSArray methods. The team replaced the Objective‑C wrapper with a C function to reduce overhead.
JSON Parsing : The app used SBJson, which performed poorly compared to NSJSONSerialization . Replacing SBJson with the system parser saved significant time.
Runtime Traversal Optimizations : Hooking objc_msgSend and using Clang sanitizer coverage to collect symbols for binary reordering. The order file was applied via Xcode’s “Order File” linker option. After reordering, a ~180 ms improvement was observed on iOS 12 devices.
Network‑Related Optimizations :
Lazy‑load WKWebView for cookie synchronization, moving its creation to the moment an H5 page is actually opened.
Cache the User‑Agent string generated via a temporary UIWebView, updating the cache only on app or system version changes.
System Interface Optimizations :
Prefer NSBundle.mainBundle over bundleWithIdentifier: for faster bundle lookup.
Lazy‑load services that call UIApplication.beginReceivingRemoteControlEvents to defer the call.
Ad Business Optimizations : Added a dynamic switch for member‑user ads and moved ad‑request timing earlier (after network stack init), reducing ad‑related startup latency by ~300‑400 ms.
Other Business‑Level Optimizations : Deferred one‑click login SDK calls to only when the user is not logged in, and cleaned up various non‑critical code paths.
Summary : After a series of T1 and T2 optimizations, the NetEase Cloud Music app achieved >30 % startup‑time reduction. The team emphasizes the need for continuous monitoring, automated detection of launch‑performance regressions, and user‑perceived latency tracking. Future work includes a full‑stack launch‑performance guard system.
References (selected): WWDC 2019 423, WWDC 2022 110362, A4LoadMeasure, HookZz, AppleTrace, AppOrderFiles, and various internal blog posts.
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.