Mobile Development 21 min read

How We Tamed Android Memory Leaks in WeChat: Tools, Practices, and Lessons

This article shares WeChat's engineering experience in detecting and preventing various Android memory leaks—including Activity, Bitmap, native, and thread leaks—by building custom tools, refining monitoring pipelines, and applying fallback protection to keep the app stable at scale.

WeChat Client Technology Team

Nov 12, 2018

How We Tamed Android Memory Leaks in WeChat: Tools, Practices, and Lessons

Introduction

Memory problems are classic yet hidden issues in software; they often surface without warning and become hard to reproduce, diagnose, and fix. Over many versions, WeChat has encountered numerous memory leaks—Activity leaks, unclosed Cursors, thread over‑use, unchecked caches, and silent native‑library leaks—some even forcing architectural changes such as the multi‑process shift after a WebView leak.

Activity Leak Detection

An Activity leak occurs when an Activity is retained by a longer‑lived object via a strong reference, preventing GC. Because Activities hold Context and UI, they are easy to unintentionally retain, and leaks rarely show crashes until OOM. Early on we used Hprof dumps and manual MAT analysis, but the process became unsustainable as code grew.

We adopted LeakCanary, then built ResourceCanary on top of it, integrating it into our Matrix quality platform. ResourceCanary separates detection from analysis, trims Hprof files to about 1/10 of the original size, and runs fully automated in daily tests. Detected leaks are reported to Matrix, which creates tickets, assigns owners, and tracks fixes.

Separate detection and analysis logic.

Trim Hprof files to reduce storage and transmission.

Typical leak scenarios discovered include anonymous inner‑class references, missing unregister calls, system‑component leaks (e.g., SensorManager, InputMethodManager), and long‑running Runnables that hold Activities.

Bitmap Allocation and Tracking

Bitmaps are a major memory consumer. We wrap all Bitmap creation (via Bitmap.create or BitmapFactory) in a unified interface and record each instance in a WeakHashMap together with creation time and stack trace. Two monitoring modes are used:

Aggressive mode (testing): checks the map every few seconds; if Java heap exceeds 200 MB, it dumps live Bitmap info and an Hprof file.

Conservative mode (release): only logs Bitmap info >1 MB when an OOM occurs.

We also identified misuse of a static LruCache that never cleared, causing permanent Bitmap retention.

Native Memory Leak Detection

Native leaks lack GC, so we intercept allocation and free functions. Existing tools (valgrind, ASan) were too heavy for large‑scale Android testing. We therefore built two lightweight solutions:

For non‑re‑compilable libraries: PLT hook the allocation functions, record address, size, and library path, and periodically check for mismatched frees.

For re‑compilable libraries: compile with -finstrument-functions and use --wrap to redirect allocation/free calls, capturing call stacks.

Both add less than 10 ns per allocation and helped uncover dozens of long‑standing native leaks.

Thread Monitoring

Thread‑related OOMs can arise when pthread_create fails due to insufficient stack memory. We monitor thread creation failures and track total thread count. When the count exceeds a threshold, we dump thread info and raise alerts. This revealed occasional thread leaks and rapid thread spikes (500+ threads) that contributed to OOMs.

Memory Monitoring

We collect both physical and virtual memory metrics via ActivityManager.getProcessMemoryInfo, Debug.MemoryInfo, and Runtime APIs. Regular monitoring runs every few minutes (up to 30 minutes) using a Fibonacci schedule, while low‑memory monitoring listens to onLowMemory and onTrimMemory callbacks. When low memory is detected, we log detailed breakdowns of JavaHeap, NativeHeap, graphics, stack, and code usage.

Fallback Protection

For long‑running users (over a day) we apply a safety strategy: if the app is in background >30 minutes, during off‑peak hours, without foreground services, and memory usage exceeds defined thresholds (Java heap >85% of max, native >800 MB, or vmsize >85% of 4 GB), we proactively kill the process and restart it via push, minimizing user‑visible OOM crashes.

Conclusion

We presented a suite of engineering practices—custom detection tools, automated pipelines, bitmap tracking, native leak hooks, thread monitoring, and fallback protection—that together reduce OOM incidents in WeChat. While memory issues cannot be eliminated entirely, continuous tooling and automation improve detection efficiency and overall app stability.

native Performance Android thread Memory Leak Bitmap WeChat

Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.