Mobile Development 21 min read

How Alipay Optimizes Cold-Start Performance with Spider SDK and APM

Alipay’s client engineering team details a comprehensive approach to monitoring, measuring, and improving time‑consuming user experiences—especially cold‑start—by employing video frame analysis, ActivityTaskManager, extensive instrumentation, home‑page snapshot techniques, temperature control, patch‑APK injection, AOP‑based diagnostics, and the Spider SDK within a robust APM platform.

Alipay Experience Technology
Alipay Experience Technology
Alipay Experience Technology
How Alipay Optimizes Cold-Start Performance with Spider SDK and APM

1. Introduction

The client experience covers both business‑level and basic performance aspects. Alipay classifies performance into time‑consuming (cold start, first login, first jump, home rendering, scanning, wake‑up, mini‑programs) and resource‑consuming (power, ANR, jank, memory, storage, network, package size). This article focuses on time‑consuming performance, especially cold‑start.

2. Scene Performance Measurement

2.1 Measurement Methods

Video Frame Splitting : Record a video of the launch, split into frames, and calculate the time from the first frame to full page display. Advantages: user‑centric; Disadvantages: high resource usage, low precision (80‑200 ms).

ActivityTaskManager : After Android KitKat, each Activity launch reports a displayed time (e.g., "+1s100ms"). Suitable for Activity timing but not for cold‑start.

Instrumentation (Tracing) : Full‑link tracing points are added to measure each node’s latency, especially page rendering completion.

Implementation details include monitoring android.view.ViewTreeObserver.OnGlobalLayoutListener for layout completion and using a home‑page snapshot technique that shows the previous home screen instantly during non‑first cold starts.

2.2 Accuracy Improvement

To achieve ~10 ms precision, Alipay experimented with launch timing, sample collection, device temperature control, and CPU frequency locking. Key techniques:

Home‑Page Snapshot : Capture screenshots of major home sections (four‑pillars, grid, notifications, ads, feed) before backgrounding and display them instantly on the next cold start.

Temperature Control : Use cooling devices and avoid low‑end phones to prevent CPU throttling.

CPU Frequency Lock : Lock main frequency on rooted devices.

Stage‑Based Data Collection : Split cold‑start into two large stages (pre_launch, time_startup) and 90+ sub‑stages to pinpoint slow code.

Data is collected via a patch APK that hooks application initialization, performance diagnostics, and configuration loading.

3. Scene Performance Diagnosis

3.1 Conventional Diagnosis

Diagnostics cover thread usage, I/O, dynamic bundle loading, services, and configuration reads. Hooking techniques include:

Hook Runnable.run, AsyncTask, and thread creation to capture thread metrics.

Hook configuration services to monitor switch reads.

Hook BundleClassLoader for bundle loading on the main thread.

Hook service creation to record initialization time.

Hook read/write APIs to detect I/O issues.

3.2 AOP Overview

Alipay uses DexAOP, which modifies dex files after compilation to insert proxy methods for method invoke, body, and object creation. Other tools (AspectJ, ASM, etc.) are mentioned but DexAOP is preferred for its build‑time injection.

3.3 Code‑Change Performance Diagnosis

Version diff is used to identify changed classes/methods. ASM bytecode instrumentation (via a custom Gradle Transform) inserts timing callbacks around changed code only, avoiding full‑app overhead. The process involves:

Detecting changed code between baseline and target versions.

Applying Gradle plugin + Transform + ASM to instrument those methods.

Generating two APKs (baseline‑instrumented and target‑instrumented) for side‑by‑side performance comparison.

Case study: version 10.1.92.4310 introduced a 60 ms cold‑start regression, traced to an extra 486 ms thread and 538 changed lines across 14 files.

4. Spider SDK

The Spider SDK abstracts scene splitting and data dumping, enabling any scenario (cold start, scanning, home rendering) to inherit measurement and diagnosis capabilities. Each scene is divided into major phases (e.g., BizName 1‑3) and sub‑phases with start/end markers; missing markers cause the phase to be ignored. Data is dumped via justDumpSpiderweb() and can include custom properties.

5. APM Performance Platform

The platform integrates measurement, diagnosis, and SDK capabilities, providing continuous integration, code‑diff services, build system hooks, and device‑farm task scheduling. It supports incremental performance regression detection, manual task triggering, and rapid onboarding of new scenes via configuration only (≈2 hours).

6. Summary

Alipay’s APM platform has driven multiple performance optimization campaigns across payment, wake‑up, thread, power, and low‑end device scenarios, ensuring zero regression and empowering developers with SDK‑based diagnostics and a robust performance platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InstrumentationAndroidAPMmobile performancecold start
Alipay Experience Technology
Written by

Alipay Experience Technology

Exploring ultimate user experience and best engineering practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.