Mobile Development 17 min read

Tailor: An Open‑Source Android Memory Snapshot Trimming and Compression Tool for OOM Governance

Tailor is an open‑source Android memory‑snapshot trimming and compression tool developed by the Xigua Video team that dramatically reduces OOM occurrences by over 95% through efficient snapshot cropping, zlib compression, and seamless integration with the dump process, while preserving essential debugging information.

Watermelon Video Tech Team
Watermelon Video Tech Team
Watermelon Video Tech Team
Tailor: An Open‑Source Android Memory Snapshot Trimming and Compression Tool for OOM Governance

Background

Stability governance has long relied on stack traces and source code, which are often insufficient for complex data‑dependent crashes. The Xigua Video Android team built a comprehensive exception‑data collection system based on Java heap snapshots, enabling dump of relatively complete memory snapshots during anomalies and, when needed, retrieving them via a cloud‑control system for detailed analysis.

Purpose of Memory Snapshots

OOM Governance – Memory snapshots provide the necessary data to resolve OOM and other memory‑related issues, serving as a sufficient condition for heap OOM analysis and the foundation for leak detection tools.

Crash Governance – Snapshots also capture Activity, Fragment, View states, framework objects, and third‑party data, reducing the need for targeted instrumentation and covering many otherwise invisible scenarios.

Why Trimming Is Needed

To make snapshots usable, problems such as storage space, transmission bandwidth, and privacy must be addressed.

Storage – Large‑heap OOM snapshots can be ~512 MB; without trimming they must be stored on external storage or SD cards, which may be unavailable or permission‑restricted.

Transmission – Smaller snapshots consume less network traffic and transmit faster, improving success rates.

Privacy – Snapshots contain sensitive data (accounts, tokens, keys, images, strings) that must be removed.

Trimming Schemes

Two known schemes exist: the open‑source Matrix approach (dump full hprof then remove duplicate Bitmap and String objects) and the author’s 2018 hprof‑stream trimming method. The Matrix method suffers from large raw dumps, heavy I/O, and incomplete trimming.

The hprof‑stream method trims during file writing, avoiding large I/O and improving performance. It focuses on OOM‑relevant data (object size and references) and primarily removes large byte[]/char[] arrays belonging to Bitmap/String objects.

hprof Format

The hprof file consists of a Header and an array of Records (Header: "JAVA PROFILE 1.0.2" + identifiers + timestamp; Record: tag + time + length + body).

Android dumps follow the hprof format but with a limited set of primary tags (STRING, LOAD_CLASS, HPROF_TAG_STACK_TRACE, HEAP_DUMP_SEGMENT, HEAP_DUMP_END). The crucial secondary tag for trimming is PRIMITIVE_ARRAY_DUMP, which stores byte[], char[], int[] etc.

Tailor Implementation

Tailor hooks the native open and write functions using xHook, intercepts the hprof stream, trims unwanted arrays, and finally applies zlib compression.

// isGzip indicates whether to gzip after trimming
public static synchronized void dumpHprofData(String fileName, boolean isGzip) throws IOException {
    nOpen(fileName, isGzip);
    Debug.dumpHprofData(fileName);
    nClose();
}

Effectiveness

Size – OOM snapshots can be reduced to ≈10 MB (≈50% reduction); non‑OOM snapshots can be reduced to ≈5 MB (≈60% reduction) or ≈10 MB (≈90% reduction).

Time – Trimming adds negligible overhead compared to native dump.

Stability – The open‑source version has run for over six months without crashes.

Practical Results at Xigua Video

With over 100 minutes of daily usage per user, OOM incidents were severe. By abandoning online leak detection and adopting Tailor for on‑demand snapshot dumping, OOM rates dropped from 3.5‱ to 0.03‱, a two‑order‑of‑magnitude reduction, while also aiding investigation of many non‑OOM crashes.

Typical crash investigations that previously required extensive instrumentation now rely on snapshot analysis (e.g., locating recycled Bitmaps, tracing null TextureLayer references, or diagnosing native OOM caused by excessive player instances).

Future Optimizations

Further trimming of hprof data beyond current byte[]/char[] removal.

Speeding up trimming/compression by eliminating the first ProcessHeap call.

Handling native‑memory‑shortage dump scenarios by hooking Record caching interfaces to trim on‑the‑fly.

Conclusion

Memory‑snapshot‑based stability governance complements traditional tools, offering objective, repeatable analysis with minimal instrumentation. Tailor demonstrates a practical, open‑source step toward a universal exception‑data collection system, and future work will continue to refine trimming, compression, and dump efficiency.

Upcoming open‑source releases will include other core monitoring tools such as Raphael (native leak detection) and Sliver (high‑performance tracing).

PerformanceMemory ManagementAndroidsnapshotoomHprofTailor
Watermelon Video Tech Team
Written by

Watermelon Video Tech Team

Technical practice sharing from Watermelon Video

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.