How Alipay Optimizes Android Cold Start: Measurement, Diagnosis, and APM Platform
This article explains how Alipay measures and improves Android cold‑start performance through video‑frame analysis, ActivityTaskManager, custom instrumentation, home‑page snapshot techniques, temperature control, patch‑APK data collection, AOP‑based diagnostics, and an integrated APM platform.
1. Introduction
The client experience covers both business‑level and basic performance aspects; we divide basic performance into time‑based and resource‑based experiences. Time‑based scenarios include cold start, first login, first jump, home rendering, scanning, wake‑up, mini‑programs, etc. Resource‑based dimensions include power consumption, ANR, jank, memory, storage, network, package size, and more.
Cold start is critical because 80% of users open Alipay via cold start, especially in queue‑payment scenarios where speed is essential.
2. Scene Performance Measurement
2.1 Measurement Methods
Video Frame
Record the launch video, split it into frames, and calculate the time from the first frame to the frame where the page is fully displayed.
Advantages : Aligns with user perception; widely used in the industry.
Disadvantages : High resource consumption, cumbersome calculation, low precision (≈80‑200 ms).
Related Technologies : OpenCV, FFmpeg, automated implementation recommended.
ActivityTaskManager
Since Android KitKat, each activity launch logs a displayed time, e.g.,
ActivityTaskManager:Displayed com.android.samples.mytest/.MainActivity: +1s100ms. This reflects the time until the first frame is drawn, suitable for activity‑level timing but not for cold‑start measurement.
Instrumentation (埋点)
Instrumentation points are the most common method, usable both online and offline. Alipay uses full‑link instrumentation to measure each node’s latency, ensuring points align with page rendering completion.
Advantages : Unified online/offline measurement with high precision.
Disadvantages : Requires synchronization of instrumentation points with rendering events. Alipay monitors android.view.ViewTreeObserver.OnGlobalLayoutListener and employs a home‑page snapshot technique to capture the exact moment of rendering.
2.2 Improving Measurement Accuracy
Achieving ~10 ms precision required experiments on cold‑start timing, sample collection, device temperature control, and CPU frequency locking.
Home‑Page Snapshot Technique
After a cold start, Alipay displays the previously captured home page image, allowing immediate interaction and reducing perceived load time.
The home page is divided into sections (four‑big‑golden, nine‑grid, notifications, ads, feed). After use, each section is screenshot and saved; on the next cold start, if screenshots exist, they are displayed instantly.
Temperature Control
Overheating causes CPU throttling; Alipay uses external cooling devices (e.g., gaming‑grade cooling pads) to maintain stable temperature during performance testing.
CPU Frequency Lock
CPU frequency can be locked via root access; reference: AndroidX Benchmark .
Stage Data Collection
Cold start is split into two major phases (pre_launch and time_startup) and over 90 sub‑stages, enabling fine‑grained identification of performance regressions.
Performance Data Collection Scheme
Alipay uses a “patch APK” that is merged into the main APK at build time. The patch APK hooks application initialization, sets up Spider SDK, DexAOP, and performance diagnostics without affecting the online version.
3. Scene Performance Diagnosis
Regular Diagnosis Dimensions
Diagnostics cover threads, IO, dynamic bundle loading, services, and configuration switches, implemented via AOP hooks in the patch APK.
AOP Overview
AOP (Aspect‑Oriented Programming) allows inserting hooks at compile or runtime. Alipay uses DexAOP, which modifies dex files directly to add proxy methods for targeted hooks.
Code‑Change Diagnosis
Alipay compares version baselines to identify changed classes, methods, and bundles, then instruments only the changed code using ASM bytecode manipulation via a custom Gradle Transform plugin.
Gradle Transform Mechanism
Each Transform is a Gradle task that processes compiled class files. Custom Transform plugins insert ASM advice adapters to record method entry/exit timestamps.
4. Spider SDK
Spider SDK abstracts scene splitting and data dumping capabilities, enabling any business scenario (cold start, scanning, home rendering) to inherit Alipay’s measurement and diagnosis features.
5. APM Performance Platform
The platform integrates continuous integration, code‑diff analysis, build systems, and real‑device task scheduling to provide end‑to‑end performance measurement, diagnosis, and optimization workflows.
6. Summary
Alipay’s APM platform has driven multiple performance campaigns across payment, wake‑up, thread, power, and low‑end device optimizations, ensuring zero regression and empowering developers with SDK‑based diagnostics.
7. References
https://developer.android.com/studio/build?hl=zh-cn https://asm.ow2.io/ https://developer.android.com/reference/tools/gradle-api/7.0/com/android/build/api/transform/Transform https://docs.gradle.org/current/userguide/artifact_transforms.html https://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#whatIs https://developer.android.google.cn/jetpack/androidx/releases/benchmark https://developer.android.com/reference/android/view/ViewTreeObserver.OnGlobalLayoutListener https://developer.android.google.cn/reference/android/app/Activity#reportFullyDrawn() https://ffmpeg.org/ https://opencv.org/
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
