Online Java OOM Attribution Solution Based on Hprof Memory Snapshots
This article introduces a comprehensive solution for diagnosing and attributing Java Out‑Of‑Memory (OOM) issues in Android apps by capturing Hprof memory snapshots, automatically analyzing heap data, identifying leaks, large objects, and class‑wide memory consumption, and providing privacy‑preserving, automated reporting and remediation workflows.
Technical Knowledge Source
How to locate and solve Android app crashes caused by Java Out‑Of‑Memory (OOM) has long been a challenge. Conventional crash logs lack detailed allocation information, making it hard to pinpoint the root cause.
In collaboration with Client Infra and business units such as Toutiao and Douyin, we developed a Hprof‑based online Java OOM attribution solution . It has been widely adopted internally, helping Helo reduce 80% of Java OOM incidents within two months and increase next‑day retention by over 2%.
After the solution was released on the Volcano Engine MARS‑APMPlus monitoring platform, early customer Meipian also achieved an 80% reduction in Java OOM over a two‑month period.
1. Java Memory Basics
1.1 Importance of Java Memory Optimization
Memory is a scarce resource; excessive heap usage leads to frequent GC pauses, UI jank, and ultimately OOM crashes that affect app usability.
1.2 Why Java OOM Occurs
Java OOM (Out Of Memory) is thrown when the JVM cannot allocate an object because the heap is exhausted. The relevant exception class is java.lang.OutOfMemoryError , defined as:
Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector.Key points to understand:
JVM memory regions (PC Register, JVM Stack, Native Method Stack, Heap, Method Area, Runtime Constant Pool)
Garbage collector works via reachability analysis from GC Roots.
Object size metrics: Shallow Size (object header + fields) and Retained Size (total memory freed when the object is collected).
1.3 How OOM Happens
When free heap bytes are insufficient for the requested allocation, the JVM throws OutOfMemoryError . Example log:
java.lang.OutOfMemoryError: Failed to allocate a 65552 byte allocation with 23992 free bytes and 23KB until OOM, max allowed footprint 536870912, growth limit 536870912Android provides runtime memory APIs such as Runtime.getRuntime().maxMemory() , totalMemory() , freeMemory() , etc., to monitor heap status.
2. Java Memory‑Related Tools
Tool
Description
Pros
Cons
MAT
The Eclipse Memory Analyzer helps find memory leaks and reduce consumption.
Powerful analysis
Offline; requires Hprof collection.
LeakCanary
Memory‑leak detection library for Android.
Integrates into app automatically
Focuses on leaks; offline analysis.
Android Studio Memory Profiler
Identifies memory leaks, large objects, and memory spikes.
Dynamic monitoring and static analysis
Requires debug build.
These tools are insufficient for online OOM attribution because they are offline, low‑automation, and cannot aggregate root‑cause data.
3. Java OOM Attribution Solution
3.1 Overview
The solution consists of three parts:
Client SDK : Captures, trims, compresses, and uploads Hprof snapshots.
Server : Stores Hprof files, restores them, performs automated analysis, aggregates issues, and assigns owners.
Frontend : Visualizes memory leaks, large objects, and class‑wide large objects.
3.2 Technical Details
3.2.1 Dumping Memory Snapshots
When a Java OOM occurs, the SDK registers an UncaughtExceptionHandler to trigger Debug.dumpHprofData() . To avoid blocking the UI, a forked subprocess can perform the dump when memory usage exceeds a configurable threshold (default 80%).
3.2.2 Trimming and Restoring Hprof Files
To protect privacy and reduce size, the SDK removes sensitive data such as string arrays and bitmap pixel buffers, shrinking typical files from ~300 MB to ~40 MB. The server later pads the trimmed sections with empty data to restore the original Hprof format for downstream tools.
3.2.3 Automated Parsing
The server parses Hprof records, builds an object‑reference graph, and extracts three key insights:
Memory leaks (objects still reachable from GC Roots after lifecycle end).
Large objects (Retained Size > 1 MB).
Class‑wide large objects (instance count > 10 and total Retained Size > 20 MB).
Example leak detection code:
private boolean mDestroyed;
final void performDestroy() {
mDestroyed = true;
// other cleanup
}After parsing, the system computes strong reference chains from leaked objects to GC Roots and reports the Retained Size.
3.2.4 Aggregation and Retrace
Issues are aggregated by leak class, large‑object class, or class‑wide object name to surface high‑frequency problems. Since Hprof files are obfuscated, a retrace tool uses symbol tables to de‑obfuscate class names and reference chains.
3.2.5 Automatic Assignment
Aggregated issues are matched to code owners (when available) and assigned via Lark notifications, closing the loop between detection and remediation.
3.3 Summary
The Hprof‑based online Java OOM attribution solution provides high‑fidelity crash reconstruction, automated heap analysis, privacy‑preserving data handling, and actionable insights that dramatically reduce OOM occurrences.
4. Optimization Results
4.1 Internal Impact
Deployed across dozens of ByteDance apps, the solution has cut Java OOM incidents by over 80% for customers like Helo, leading to measurable retention gains.
4.2 External Impact
Early adopters of MARS‑APMPlus, such as Meipian, reported an 80% reduction in OOM crashes and a similar drop in UI jank.
5. Getting Started
Interested teams can register for a free trial of MARS‑APMPlus, integrate the SDK, and start receiving automated OOM diagnostics and remediation guidance.
ByteDance Terminal Technology
Official account of ByteDance Terminal Technology, sharing technical insights and team updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.