How We Built a Scalable iOS Memory Monitoring Tool to Tackle FOOM
To combat Foreground Out‑Of‑Memory (FOOM) crashes in iOS, we describe the evolution from a simple FBAllocationTracker‑based logger to a high‑performance monitoring system that collects, stores, compresses, reports, and visualizes allocation data using custom hooks, splay trees, hash‑based stack compression, and selective reporting.
1. Implementation Principle
FOOM (Foreground Out Of Memory) occurs when an app in the foreground consumes excessive memory and is killed by the system, appearing to the user as a crash. Facebook introduced a detection method in August 2015, and WeChat began reporting FOOM at the end of 2015.
WeChat's first version used Facebook's FBAllocationTracker to monitor Objective‑C object allocation and fishhook to intercept malloc/free for heap allocation. Every second it logged the number of OC objects, the top 200 heap allocations and their call stacks to a local text file.
Drawbacks of this approach:
Insufficient granularity – small‑allocation spikes were missed; fishhook cannot hook system libraries.
Log interval trade‑off – long intervals miss peaks, short intervals increase CPU, I/O and power consumption.
Raw logs required manual analysis; no visual tool for categorisation.
In the second phase we took inspiration from Instruments' Allocations and optimised four aspects: data collection, storage, reporting and visualisation.
2. Data Collection
In September 2016 we examined libmalloc source and discovered low‑level allocation hooks ( malloc_logger and __syscall_logger). When these pointers are non‑null, every malloc/free or vm_allocate/vm_deallocate call notifies the upper layer, allowing us to record live allocation size and stack trace.
Because backtrace addresses are virtual, we also record each image's slide offset so that address - slide = symbol address.
Each allocation is assigned a Category for easier aggregation: heap objects are labelled "Malloc ", virtual memory objects use the flag name from <mach/vm_statistics.h> , and OC objects use the class name (hooked via +[NSObject alloc] ).
We later discovered that NSData objects created via NSAllocateObject bypass +[NSObject alloc]. By checking the private CoreFoundation hook __CFOASafe and __CFObjectAllocSetLastAllocEventNameFunction, we can still obtain the class name.
3. Data Storage
We switched from SQLite to a lightweight balanced binary tree – a splay tree – to store live objects. A splay tree moves frequently accessed nodes near the root, reducing average lookup time to O(log N) while using less memory than red‑black trees.
For stack traces we observed many common suffixes. We therefore stored stacks in a hash table where each node holds the current address and the index of the previous address node. Collisions are resolved by probing for the next free slot. This suffix‑compression reduces the average stored stack length from 35 frames to fewer than 5, cutting storage per stack from ~157 bytes to ~67 bytes (≈42 % compression).
4. Data Reporting
Because the live‑object dataset is huge, we only report selectively when a FOOM is detected. First we aggregate objects by Category and report the small summary fully. Then, within each Category, we merge identical stacks and report only the top‑N categories by allocation size or UI‑related categories (e.g., UIViewController, UIView) with their top‑M stacks.
5. Page Presentation
The UI mirrors Instruments' Allocations view: it lists Categories, their total allocation size, object count, and, when available, the associated stack trace.
6. Operation Strategy
To limit performance impact we enable the monitor for a sampled subset of users (gray‑release, internal whitelist) and keep only the three most recent data snapshots locally. Full reports are about 300 KB each.
7. Reducing False Positives
We refined the FOOM detection logic based on Facebook’s criteria (no app upgrade, no explicit exit, no crash, no user‑forced quit, no system reboot, app not running in background, FOOM occurred). Mis‑detections arose from inaccurate ApplicationState, remote‑control malware, missing crash callbacks, and watchdog‑induced kills (0x8badf00d). Solutions include delaying the active‑state check, improving crash‑log handling, and treating watchdog kills as a separate restart reason.
8. Results
Since launching the memory monitor in March 2017, WeChat has resolved over 30 memory‑related issues across chat, search, and Moments. FOOM rate dropped from ~3 % in early 2017 to 0.67 % and foreground‑kill rate fell from 0.6 % to 0.3 %.
9. Common Issues & Tips
UIGraphicsBeginImageContext / UIGraphicsEndImageContext : must be paired to avoid context leaks; Xcode’s Analyze can detect mismatches.
UIWebView vs WKWebView : WKWebView runs in a separate process and uses far less memory.
autoreleasepool : wrap heavy loops in an explicit pool to release temporary objects promptly.
Retain cycles : avoid strong references from blocks to self and use weak references; replace NSTimer and CAAnimation delegates with non‑retaining wrappers.
Large image handling : avoid -[UIImage drawInRect:] which decodes the full‑resolution bitmap; use ImageIO APIs to downsample directly.
Large views : split long text into multiple reusable cells instead of rendering a single massive view.
Additional iOS memory resources:
Memory Usage Performance Guidelines
No pressure, Mon!
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
