Mobile Development 18 min read

Crash Convergence and Resilience Mechanisms in Android Applications

The team maintains an ‘opt’ branch for daily‑collected crash fixes, classifies unrecoverable system, device, SDK and framework errors, and employs a Handler‑based whitelist that catches and swallows known crashes while logging others, complemented by null‑safety, concurrency, database and OOM monitoring to keep Android app crash rates manageable.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Crash Convergence and Resilience Mechanisms in Android Applications

In our project, after each version release we create an opt branch to fix online crashes and business‑logic bugs. Multiple feature branches may be developed in parallel, but the opt branch is guaranteed to be merged in the next release, and QA reserves time to test it before every version launch.

Every morning the team checks the DUMP backend, collects crash stacks from the previous day and assigns them to the responsible business owners. If a crash is easy to fix, it is patched immediately and merged into the opt branch; if it is hard to locate or time‑consuming, an expected fix time is given or the bug is turned into a technical optimisation task.

Crash Disaster‑Recovery Mechanism

Background : During crash convergence we discovered several crash categories that cannot be fixed at the application layer.

Examples of unrecoverable crashes:

java.lang.NullPointerException: Attempt to invoke virtual method 'boolean android.content.ClipDescription.hasMimeType(java.lang.String)' on a null object reference
    at android.widget.TextView.canPasteAsPlainText(TextView.java:15065)
    at android.widget.Editor$TextActionModeCallback.populateMenuWithItems(Editor.java:4692)
    at android.widget.Editor$TextActionModeCallback.onCreateActionMode(Editor.java:4627)
java.lang.NullPointerException:Attempt to invoke virtual method 'int android.text.Layout.getLineForOffset(int)' on a null object reference
ava.lang.RuntimeException:Unable to start activity ComponentInfo{com.netease.popo/com.huawei.hms.activity.BridgeActivity}:
android.util.AndroidRuntimeException: requestFeature() must be called before adding content"
android.view.WindowManager$BadTokenException:Unable to add window -- token android.os.BinderProxy

We classified these into four groups:

System exceptions that can only be guarded with try‑catch at the call site.

Device‑specific crashes that appear only on certain manufacturers.

Third‑party SDK bugs that require the SDK vendor to fix.

Android framework‑level crashes (e.g., BadTokenException ) that need defensive checks in the UI layer.

To reduce user impact we propose a framework that intercepts these crashes, preventing them from crashing the app.

Technical Solution

The solution leverages the Android Handler mechanism. After app launch we initialise a crash‑whitelist (built‑in and server‑driven). We then post a message to the main thread using Handler#post() . Inside the posted Runnable we run an infinite loop where Looper.loop() is wrapped in a try‑catch . As long as the process stays alive, the loop continuously processes subsequent messages. When a crash occurs, it is caught, checked against the whitelist, and either swallowed (if whitelisted) or re‑thrown.

The framework has been in production for several years and has collected 81 distinct crash types, including a recent TransferSplashScreenViewStateItem error:

java.lang.IllegalArgumentException: Activity client record must not be null to execute transaction item: android.app.servertransaction.TransferSplashScreenViewStateItem@de845fa
    at android.app.servertransaction.ActivityTransactionItem.getActivityClientRecord(ActivityTransactionItem.java:85)
    ...

Typical database‑related crashes are also handled:

com.tencent.wcdb.CursorWindowAllocationException:
Cursor window allocation of 2048 kb failed. total:8159,active:49
    at com.tencent.wcdb.CursorWindow.
(SourceFile:127)
Caused by:
com.tencent.wcdb.database.SQLiteFullException: database or disk is full (code 13,errno 0)
    at com.tencent.wcdb.database.SQLiteConnection.nativeExecute(Native Method)
    ...
com tencent wcdb database.SQLiteDatabaseCorruptException: database disk image is malformed (code 11, errno 0)
    at com.tencent.wcdb.database.SQLiteConnection.nativePrepareStatement(Native Method)
    ...

We mitigated the CursorWindowAllocationException by monitoring SQL execution counts and optimising heavy queries, reducing occurrences by over 90%.

Other Crash Convergence Practices

Common crash categories and mitigation strategies include:

NullPointerException (NPE)

Use @NonNull / @Nullable annotations.

Prefer Kotlin, which enforces null‑safety.

Check objects retrieved from collections before use.

Validate Context arguments passed to third‑party libraries.

Declare fields as final or val to avoid accidental nullification.

Run static analysis plugins in Android Studio.

IndexOutOfBoundsException

Validate index bounds before accessing collections.

When using Spannable.setSpan , ensure start and end are within the text length and non‑negative.

ConcurrentModificationException

Iterate over a copy of the collection when removal is needed.

Use thread‑safe collections such as CopyOnWriteArrayList or ConcurrentHashMap .

System Service (Framework API) Exceptions

Wrap all system‑service calls in try‑catch and log the exception.

Watch for ANR caused by frequent service calls.

Database Issues

We maintain a dedicated quality checklist for each feature, covering database usage patterns.

OOM Convergence

Out‑of‑Memory (OOM) crashes are classified into heap allocation failures, thread‑creation failures, file‑descriptor exhaustion, and native memory OOM.

Memory Leak Monitoring

We use the open‑source KOOM library to capture leak reports and upload them for weekly triage.

Global Floating Window

A custom developer tool displays real‑time memory usage (via Debug.MemoryInfo#getTotalPss() ) and highlights values that exceed a configurable threshold.

Thread Count Monitoring

Thread numbers are read from /proc/[pid]/status :

public static String readThreadStatus(String pid){
    RandomAccessFile reader2= null;
    try {
        reader2 = new RandomAccessFile("/proc/" + pid + "/status", "r");
        String str;
        while ((str = reader2.readLine()) != null) {
            if (str.contains("Threads")) {
                return str;
            }
        }
        reader2.close();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            if (reader2 != null) {
                reader2.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    return "";
}

In practice we also use Thread.getAllStackTraces() to monitor Java threads.

File‑Descriptor (FD) Monitoring

FD count is obtained via:

public static int getCurrentFdSize() {
    int size = 0;
    File dir = new File("/proc/self/fd");
    try {
        File[] fds = dir.listFiles();
        if (fds != null) {
            size = fds.length;
            for (File fd : fds) {
                if (Build.VERSION.SDK_INT >= 21) {
                    MLog.d("message", Os.readlink(fd.getAbsolutePath()));
                }
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return size;
}

We observed a rare FD‑exhaustion crash caused by a front‑end page that repeatedly created new Socket objects.

Summary

Through systematic crash collection, classification, and a custom Handler‑based interception framework, we have reduced crash rates to a manageable level. Continuous code review, static‑analysis tools, and emerging AI‑driven scanning are being explored to further lower the probability of runtime failures.

Mobile DevelopmentMemory ManagementAndroidCrash HandlingExceptionHandler
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.