Mobile Development 23 min read

Comprehensive Memory Monitoring and Optimization Strategy for the K歌 Android Application

To eliminate the K歌 Android app’s frequent OOM crashes, a four‑stage framework was implemented—development self‑checks, automated testing baselines, gray‑release real‑time monitoring, and production continuous sampling—collecting virtual memory, Java heap, file‑descriptor, thread, and native‑heap metrics, feeding a backend platform that visualizes leaks, guides fixes, and has already resolved over 120 leaks, cutting crashes by more than 25 %.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Comprehensive Memory Monitoring and Optimization Strategy for the K歌 Android Application

Background : In 2020 the K歌 Android app suffered from increasing white‑screen crashes and top‑crash events, both traced to insufficient memory, threads, and file descriptors (fd). Manual performance testing could not keep up with rapid feature iteration, leading to limited test coverage, insufficient scenario simulation, and poor diagnostic information from user‑reported OOMs.

Goal : Establish a systematic monitoring and remediation framework to solve existing memory‑related performance issues and prevent recurrence.

Solution Overview

The solution covers the entire development lifecycle and consists of four stages:

Development stage : Developers run a self‑check checklist to catch performance problems early. Integrated tools provide real‑time views of memory, thread, fd, and image usage inside the app. Leak issues must be resolved before code merge.

Testing stage : Automated test cases for core scenarios record memory, thread, and fd metrics. Baselines are defined and deviations across versions are detected.

Gray‑release stage : Real‑time monitoring of gray users captures memory spikes; when thresholds are exceeded, dump files and user paths are uploaded for backend clustering and bug creation.

Production stage : Continuous monitoring of virtual memory, Java heap, fd, thread count, PSS, and native heap. Sampling is used for low‑impact metrics; detailed data is collected on demand for crash or white‑screen investigations.

Monitoring Implementation Details

The monitoring focuses on five key resources:

Virtual Memory

Read /process/pid/status for VmSize and /proc/pid/smaps for detailed mapping. Typical limits are 4 GB on 32‑bit and 512 GB on arm64.

Java Heap

Use Runtime.getRuntime().maxMemory() for limits, Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory() for current usage, and Debug.dumpHprofData(String fileName) for heap dumps.

File Descriptors (FD)

Read /proc/pid/limits to get “Max open files”, enumerate /proc/pid/fd for current count, and resolve links via Os.readlink .

Thread Count

Inspect /proc/pid/status for the Threads field and obtain stack traces via Thread.getAllStackTraces() .

Native Memory

Query nativePss for native‑only and shared memory. Hook malloc and free to record allocation sites and stack traces for analysis.

Platform Architecture

The client‑side monitor collects data and uploads it to a backend performance platform where it is parsed, aggregated, visualized, and linked to bug‑tracking systems. The backend also supports clustering of leaks, per‑scenario memory distribution, and image‑size analysis.

Typical Case Analyses

Page Leak Example : The class BaseAnimationResStrategy retained strong references after asynchronous resource download, causing a leak. Switching to weak references resolved the issue.

Thread Leak Example : Over 200 threads named android.media.AudioRecordingMonitor.RecordingCallback were created due to un‑deregistered audio callbacks. The offending code was mARecorder.registerAudioRecordingCallback(Executors.newSingleThreadExecutor(), mAudioRecordingCallback); .

SO Library Leak Example : libwnscloudsdk.so showed a steady native‑memory increase (≈3 MB per 20 min). Repeated calls to xputf162utf8 allocated 267 bytes each time without freeing.

Image Memory Issues : Problems identified include oversized original images, duplicate bitmap creation, delayed recycle, Hippy‑framework oversized decoding, Glide configuration mismatches, and lingering bitmaps on invisible pages.

Self‑Check Tools

The integrated toolbox provides:

Memory Increment Tracker : Mark start/end points to compute per‑scenario memory deltas (e.g., 60 MB PSS increase during live‑stream).

SO Library Analyzer : Records native allocations per SO, supports segment‑wise capture, and outputs CSV for PC‑side analysis.

Image Detector : Shows bitmap size, memory footprint, and allocation stack, helping developers spot oversized or leaked images.

Results and Future Plans

Before version 7.16, the monitoring system uncovered >120 page/thread leaks, reduced crashes by >25 %, and lowered white‑screen reports dramatically. Ongoing work includes dump‑file trimming, page‑level monitoring, proactive leak detection, additional metrics (CPU, disk, storage), alerting on configuration‑driven memory shifts, and performance‑optimized native analysis.

performance optimizationAndroidtoolingmemory-monitoringNative Memory
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.