Fundamentals 24 min read

Online Java OOM Attribution Solution Based on Hprof Memory Snapshots

This article introduces a comprehensive solution for diagnosing and attributing Java Out‑Of‑Memory (OOM) issues in Android apps by capturing Hprof memory snapshots, automatically analyzing heap data, identifying leaks, large objects, and class‑wide memory consumption, and providing privacy‑preserving, automated reporting and remediation workflows.

ByteDance Terminal Technology
ByteDance Terminal Technology
ByteDance Terminal Technology
Online Java OOM Attribution Solution Based on Hprof Memory Snapshots

Technical Knowledge Source

How to locate and solve Android app crashes caused by Java Out‑Of‑Memory (OOM) has long been a challenge. Conventional crash logs lack detailed allocation information, making it hard to pinpoint the root cause.

In collaboration with Client Infra and business units such as Toutiao and Douyin, we developed a Hprof‑based online Java OOM attribution solution . It has been widely adopted internally, helping Helo reduce 80% of Java OOM incidents within two months and increase next‑day retention by over 2%.

After the solution was released on the Volcano Engine MARS‑APMPlus monitoring platform, early customer Meipian also achieved an 80% reduction in Java OOM over a two‑month period.

1. Java Memory Basics

1.1 Importance of Java Memory Optimization

Memory is a scarce resource; excessive heap usage leads to frequent GC pauses, UI jank, and ultimately OOM crashes that affect app usability.

1.2 Why Java OOM Occurs

Java OOM (Out Of Memory) is thrown when the JVM cannot allocate an object because the heap is exhausted. The relevant exception class is java.lang.OutOfMemoryError , defined as:

Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector.

Key points to understand:

JVM memory regions (PC Register, JVM Stack, Native Method Stack, Heap, Method Area, Runtime Constant Pool)

Garbage collector works via reachability analysis from GC Roots.

Object size metrics: Shallow Size (object header + fields) and Retained Size (total memory freed when the object is collected).

1.3 How OOM Happens

When free heap bytes are insufficient for the requested allocation, the JVM throws OutOfMemoryError . Example log:

java.lang.OutOfMemoryError: Failed to allocate a 65552 byte allocation with 23992 free bytes and 23KB until OOM, max allowed footprint 536870912, growth limit 536870912

Android provides runtime memory APIs such as Runtime.getRuntime().maxMemory() , totalMemory() , freeMemory() , etc., to monitor heap status.

2. Java Memory‑Related Tools

Tool

Description

Pros

Cons

MAT

The Eclipse Memory Analyzer helps find memory leaks and reduce consumption.

Powerful analysis

Offline; requires Hprof collection.

LeakCanary

Memory‑leak detection library for Android.

Integrates into app automatically

Focuses on leaks; offline analysis.

Android Studio Memory Profiler

Identifies memory leaks, large objects, and memory spikes.

Dynamic monitoring and static analysis

Requires debug build.

These tools are insufficient for online OOM attribution because they are offline, low‑automation, and cannot aggregate root‑cause data.

3. Java OOM Attribution Solution

3.1 Overview

The solution consists of three parts:

Client SDK : Captures, trims, compresses, and uploads Hprof snapshots.

Server : Stores Hprof files, restores them, performs automated analysis, aggregates issues, and assigns owners.

Frontend : Visualizes memory leaks, large objects, and class‑wide large objects.

3.2 Technical Details

3.2.1 Dumping Memory Snapshots

When a Java OOM occurs, the SDK registers an UncaughtExceptionHandler to trigger Debug.dumpHprofData() . To avoid blocking the UI, a forked subprocess can perform the dump when memory usage exceeds a configurable threshold (default 80%).

3.2.2 Trimming and Restoring Hprof Files

To protect privacy and reduce size, the SDK removes sensitive data such as string arrays and bitmap pixel buffers, shrinking typical files from ~300 MB to ~40 MB. The server later pads the trimmed sections with empty data to restore the original Hprof format for downstream tools.

3.2.3 Automated Parsing

The server parses Hprof records, builds an object‑reference graph, and extracts three key insights:

Memory leaks (objects still reachable from GC Roots after lifecycle end).

Large objects (Retained Size > 1 MB).

Class‑wide large objects (instance count > 10 and total Retained Size > 20 MB).

Example leak detection code:

private boolean mDestroyed;

final void performDestroy() {
    mDestroyed = true;
    // other cleanup
}

After parsing, the system computes strong reference chains from leaked objects to GC Roots and reports the Retained Size.

3.2.4 Aggregation and Retrace

Issues are aggregated by leak class, large‑object class, or class‑wide object name to surface high‑frequency problems. Since Hprof files are obfuscated, a retrace tool uses symbol tables to de‑obfuscate class names and reference chains.

3.2.5 Automatic Assignment

Aggregated issues are matched to code owners (when available) and assigned via Lark notifications, closing the loop between detection and remediation.

3.3 Summary

The Hprof‑based online Java OOM attribution solution provides high‑fidelity crash reconstruction, automated heap analysis, privacy‑preserving data handling, and actionable insights that dramatically reduce OOM occurrences.

4. Optimization Results

4.1 Internal Impact

Deployed across dozens of ByteDance apps, the solution has cut Java OOM incidents by over 80% for customers like Helo, leading to measurable retention gains.

4.2 External Impact

Early adopters of MARS‑APMPlus, such as Meipian, reported an 80% reduction in OOM crashes and a similar drop in UI jank.

5. Getting Started

Interested teams can register for a free trial of MARS‑APMPlus, integrate the SDK, and start receiving automated OOM diagnostics and remediation guidance.

JavaAndroidPerformanceMonitoringHprofOutOfMemoryErrorMemoryAnalysis
ByteDance Terminal Technology
Written by

ByteDance Terminal Technology

Official account of ByteDance Terminal Technology, sharing technical insights and team updates.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.