How We Built a Low‑Overhead iOS Memory Monitor to Tame FOOM
This article details the design and evolution of WeChat's iOS memory‑monitoring system that detects Foreground Out‑of‑Memory (FOOM) events, covering detection methods, data collection, storage optimizations, reporting strategies, false‑positive reduction, and the measurable impact on app stability.
1. Implementation Principle
FOOM (Foreground Out‑of‑Memory) occurs when an app in the foreground consumes excessive memory and is killed by the system, appearing to the user like a crash. Facebook introduced a detection method in 2015, and WeChat began reporting FOOM data at the end of 2015, initially seeing a ~3% FOOM‑to‑active‑user ratio.
The first version used Facebook's FBAllocationTracker to track Objective‑C object allocations and fishhook to hook malloc / free for heap monitoring, logging counts, top‑200 heap sizes and stack traces every second. This simple approach quickly identified a contact‑module FOOM caused by a massive DB load.
Drawbacks of the first version:
Insufficient granularity – small‑allocation spikes were missed; fishhook cannot hook system libraries.
Log interval trade‑off – long intervals miss peaks, short intervals increase CPU, I/O and power consumption.
Raw logs required manual analysis; no visual tooling for categorisation.
Version two took inspiration from Instruments' Allocations, focusing on four optimisation areas: data collection, storage, reporting and visualisation.
1.1 Data Collection
In late September 2016, while investigating iOS 10 nano‑crash, we examined libmalloc source and discovered the malloc_logger and __syscall_logger function pointers. When non‑null, these pointers receive notifications for malloc / free and vm_allocate / vm_deallocate calls, enabling us to record live object allocation size and stack trace. The stack address must be adjusted by the image’s slide to resolve symbols.
Each memory object is assigned a Category for easier analysis:
Heap objects: "Malloc " (e.g., "Malloc 48.00KiB").
Virtual memory objects: category derived from the flags argument of vm_allocate (see <mach/vm_statistics.h>).
Objective‑C objects: the class name, obtained by hooking +[NSObject alloc].
We later discovered that NSData objects created via the class method use NSAllocateObject instead of +[NSObject alloc], so the class name cannot be hooked. By checking the private CoreFoundation flag __CFOASafe and the function pointer __CFObjectAllocSetLastAllocEventNameFunction, we can retrieve the object type after creation.
If private APIs are unavailable, an alternative is to replace the function pointers in the malloc_zone_t returned by malloc_default_zone, achieving the same effect as malloc_logger. Virtual‑memory allocation still relies on fishhook.
2. Data Storage
Live Object Management
During the first 10 seconds of WeChat launch, ~800 k objects are created and ~500 k are released. To minimise allocation overhead, we replaced SQLite with a lightweight balanced binary tree.
We adopted a splay tree (also called a split tree) – a binary search tree with average O(log N) operation time. It uses less memory than red‑black trees because it stores no extra balancing information. By exploiting locality (recently accessed nodes are likely to be accessed again), frequently queried nodes are moved near the root, making it ideal for managing the large number of short‑lived objects.
Traditional pointer‑based binary trees allocate memory for each node. To further reduce allocations, we implemented the tree using an array: child indices are stored as integers referencing positions in the array, and deleted nodes reuse the index of the most recently freed node.
Stack Storage
We observed millions of backtrace stacks during a typical WeChat session, with an average length of 35 frames (max 64). Storing each address as 36 bits would require ~157.5 bytes per stack. However, many stacks share a common suffix. By compressing these suffixes using a hash‑table‑based linked‑list structure, we reduced the average stored length to <5 frames, achieving a 42 % compression ratio (≈66.7 bytes per stack).
Performance Data
On an iPhone 6 Plus the monitoring tool consumes <13 % CPU and about 20 MB of RAM (mapped via mmap). Heavy users (many groups, frequent messages) may see slightly higher usage.
3. Data Reporting
Because the monitor records every live object, full upload on FOOM is impractical. We therefore report selectively:
All objects are first grouped by Category; the count and total size per Category are uploaded (small payload).
Within each Category, identical stack traces are merged; only the top‑N categories by allocation size and UI‑related categories (e.g., UIViewController, UIView) have their top‑M stacks reported.
4. Page Display
The UI mirrors Instruments' Allocations view, showing each Category with its total size, object count, and (when available) the allocating stack.
5. Operational Strategy
Monitoring adds overhead and generates ~300 KB of data per upload. To limit backend pressure we enable the feature for a sampled subset of users (gray‑release, internal whitelist) while keeping the last three uploads locally.
6. Reducing False Positives
We refined Facebook’s FOOM detection criteria (no upgrade, no explicit exit, no crash, no user‑forced quit, no system reboot, app not in background, FOOM occurrence). In practice, several sources of mis‑classification appeared:
Unreliable ApplicationState : brief background wake‑ups cause false “active” states; we now wait one second after BecomeActive before treating the launch as normal.
Bot‑controlled devices : remote‑control apps trigger rapid launches and quits, leading to mis‑detections.
CrashReport callback failures : memory‑corruption crashes prevented the crash callback, causing FOOM to be reported instead.
Watchdog‑induced kills (0x8badf00d) : front‑end deadlocks or sustained high CPU cause system‑initiated termination; we now treat captured stalls as a separate “frontend freeze” restart reason.
7. Results
Since March 2017, the monitor has helped resolve over 30 memory‑related issues across chat, search, Moments and other features. FOOM rate dropped from 3 % to 0.67 % and foreground‑freeze rate fell from 0.6 % to 0.3 %.
8. Common Pitfalls
UIGraphicsBeginImageContext / UIGraphicsEndImageContext : must be paired; otherwise a graphics context leak occurs.
UIWebView vs WKWebView : WKWebView runs in a separate process and uses far less memory.
autoreleasepool : large loops creating many autoreleased objects should wrap an @autoreleasepool to limit peak memory.
retain cycles : avoid strong references from blocks to self and from timers/animations to their delegates; WeChat uses custom MMNoRetainTimer and MMDelegateCenter.
Large image handling : drawing a high‑resolution image with -[UIImage drawInRect:] decodes the image and creates a full‑size bitmap; use lower‑level ImageIO APIs to avoid the intermediate bitmap.
Huge views : rendering extremely long text in a single view consumes massive memory; split content into multiple reusable cells (e.g., UITableView).
9. Further Reading
Memory Usage Performance Guidelines
No pressure, Mon!
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
