Mobile Development 30 min read

Practices for Reducing Crash Rate in Meituan Waimai Android App

The Meituan Waimai Android team cut crash rates from 0.3% to 0.02% by applying systematic crash governance, null‑pointer and OOM safeguards, AOP bytecode rewriting, dependency checks, modular architecture, and robust monitoring, offering a comprehensive blueprint for other Android teams to improve stability.

Meituan Technology Team

Jun 14, 2018

Practices for Reducing Crash Rate in Meituan Waimai Android App

Crash rate is a key indicator of an Android app’s quality. Ignoring it can lead to user loss and huge business impact. This article shares the extensive practices carried out by the Meituan Waimai Android client team to reduce the crash rate from 0.3% (3‰) to 0.02% (2‰) and provides insights for other teams.

Challenges and Achievements

High user frequency, rapid business growth, and severe Android fragmentation made crash reduction extremely challenging. Through comprehensive efforts, the average crash rate dropped from 3‰ to 2‱, with the best value around 1‱ (Crash rate = Crash count / DAU).

Crash Governance Principles

From point to surface: solve a crash class‑wide, not just a single instance.

Do not swallow exceptions casually; understand the root cause and handle it appropriately.

Prevention > remediation: aim to eliminate crashes before they happen.

Common Crash Types and Fixes

NullPointerException

Caused by (1) operating on an uninitialized object, or (2) operating on an object that has been set to null after being reclaimed. Mitigations include:

Null‑check before use.

Use @NonNull/@Nullable annotations.

Avoid static fields; use SharedPreferences if necessary.

Consider Kotlin.

For the second case (object becomes null after Activity/Fragment destruction), mitigations include:

Check Activity/Fragment state before executing callbacks; add try‑catch.

Encapsulate LifecycleMessage/Runnable and add custom Lint checks.

Cancel all pending requests in BaseActivity / BaseFragment on destroy.

IndexOutOfBoundsException

Often occurs in ListView adapters or multithreaded container access. Example fix for adapter data changes:

public static void setFinalStatic(Field field, Object newValue) throws Exception {
    field.setAccessible(true);
    Field artField = Field.class.getDeclaredField("artField");
    artField.setAccessible(true);
    Object artFieldValue = artField.get(field);
    Field accessFlagsFiled = artFieldValue.getClass().getDeclaredField("accessFlags");
    accessFlagsFiled.setAccessible(true);
    accessFlagsFiled.setInt(artFieldValue, field.getModifiers() & ~Modifier.FINAL);
    field.set(null, newValue);
}

private void initVivoV3MaxCrashHander() {
    if (!isVivoV3()) return;
    try {
        setFinalStatic(AsyncTask.class.getDeclaredField("SERIAL_EXECUTOR"), new SafeSerialExecutor());
        Field defaultfield = AsyncTask.class.getDeclaredField("sDefaultExecutor");
        defaultfield.setAccessible(true);
        defaultfield.set(null, AsyncTask.SERIAL_EXECUTOR);
    } catch (Exception e) {
        L.e(e);
    }
}

System‑level Crash Handling

Device fragmentation and OEM‑custom ROMs cause obscure crashes. Detection relies on cloud testing platforms and online monitoring. Typical solutions:

Identify suspicious code and modify or guard it.

Use Java or Native Hooking to replace problematic APIs (Java Hook via reflection/dynamic proxy, Native Hook by swapping method implementations).

If hooking fails, reverse‑engineer the ROM to locate the issue.

Example: a crash only on Vivo V3Max was traced to a custom AbsListView$UpdateBottomFlagTask class. The team used reflection to modify the static SERIAL_EXECUTOR field of AsyncTask and added a try‑catch, solving the problem.

OOM (OutOfMemoryError) Prevention

OOM is a top‑ranked crash type, mainly caused by memory leaks and large objects (especially Bitmaps). Mitigations include:

Detect and fix memory leaks (e.g., Activity leaks caused by anonymous Handlers, misuse of Context, View holding Activity context).

Use LeakCanary for automatic leak detection and StrictMode during debug.

Optimize large images: use mature image libraries (Glide), load images according to view size, enable server‑side resizing, and monitor Bitmap memory usage.

AOP‑Based Crash Prevention

With the Transform API in Android Gradle plugin 1.5.0, bytecode can be altered at compile time. The team built a Gradle plugin that replaces unsafe Intent.getXXXExtra calls with safe utility methods, catching exceptions in release builds while re‑throwing them in debug.

WaimaiBytecodeManipulator {
    replacements(
        "android/content/Intent.getIntExtra(Ljava/lang/String;I)I=com/waimai/IntentUtil.getInt(Landroid/content/Intent;Ljava/lang/String;I)I",
        "android/content/Intent.getStringExtra(Ljava/lang/String;)Ljava/lang/String;=com/waimai/IntentUtil.getString(Landroid/content/Intent;Ljava/lang/String;)Ljava/lang/String;",
        "android/content/Intent.getBooleanExtra(Ljava/lang/String;Z)Z=com/waimai/IntentUtil.getBoolean(Landroid/content/Intent;Ljava/lang/String;Z)Z",
        ...
    )
}

Dependency Management Issues

Multiple AAR versions can cause runtime NoClassDefFoundError, NoSuchFieldError, etc. The team introduced a custom Gradle plugin Defensor to check class/field/method existence at compile time and another plugin SVD to enforce strict version alignment.

Architectural Practices to Reduce Crashes

Modularize business components with unique package names and owners for clear responsibility.

Centralize page navigation via a scheme router, adding a single ActivityNotFoundException guard.

Refactor network layer to handle API dirty data and validation before UI consumption.

Monitoring and Damage Control

After implementing the above checks, the app is released with a monitoring pipeline: gray‑release monitoring, full‑release alerts, large‑image monitoring, and automated notifications (email, IM, reports). When crashes still occur, the team evaluates severity, applies business downgrade, hot‑fixes via the in‑house Robust framework, or forces an upgrade if necessary.

Future Outlook – Self‑Healing Crashes

The team envisions automatic self‑repair mechanisms: detecting hardware acceleration failures, falling back from JNI to Java implementations, switching network libraries, etc., based on runtime diagnostics.

References

Crash rate reduction case study (Meituan Waimai).

ART and OAT loading analysis.

Android dynamic logging system Holmes.

Android Hook technology discussion.

Meituan Waimai Android Lint practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Android aop memory leak crash management

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.