Mobile Development 13 min read

Xianyu Android Log Governance and Performance Monitoring Solution

The Xianyu Android Log Governance solution unifies local console logs, TLog and online logs, adds comprehensive logcat capture and AUS‑OSS retrieval, replaces BlockCanary with low‑overhead frame‑callback lag/ANR detection, and provides dashboards and a batch‑query platform, cutting technical‑opinion ratio from 10.5 % to 4.7 % and boosting upload success.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Xianyu Android Log Governance and Performance Monitoring Solution

Current situation: Xianyu relies on Alibaba infrastructure for log collection, with capabilities such as crash aggregation, local TLog, online event and user‑behavior logs. However many issues remain: missing crash/ANR logs for crashes, difficulty locating business issues, incomplete log content, limited local log capability, low command‑push success, and low feedback coverage.

Overall governance design: a new system integrates local console logs, TLog, online logs, and a retrieval strategy to close the gaps.

Enhancing local log capability: capture Android logcat logs (LOG_ID_MAIN, LOG_ID_EVENTS, LOG_ID_CRASH) via adb commands and package them with AUS to OSS. Example commands:

adb logcat -d -v threadtime -t 20000
adb logcat -d -b events -v threadtime -t 6666
adb logcat -d -b crash -v threadtime -t 6666

Local log retrieval: upload via AUS/TLog, improve success rate, and provide a batch retrieval platform for difficult‑to‑obtain logs.

Online lag/ANR detection: existing BlockCanary and adb bugreport have limitations in production. Permission issues prevent reading /data/anr/traces.txt on Android 6+. Sample code:

File mSystemTraceFile;
this.mSystemTraceFilePath = "/data/anr/traces.txt";
this.mSystemTraceFile = new File(this.mSystemTraceFilePath);
if (!this.mSystemTraceFile.exists()) {
    String propSystemTraceFilePath = SystemPropertiesUtils.get("dalvik.vm.stack-trace-file");
    this.mSystemTraceFile = new File(propSystemTraceFilePath);
}

BlockCanary principle: sets Looper.mLogging to monitor UI thread tasks. Sample:

public void start() {
    if (!mMonitorStarted) {
        mMonitorStarted = true;
        Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
    }
}

New solution: replace Looper task monitoring with Android frame callbacks. Record timestamps on each frame; if the interval exceeds 500 ms, treat it as a lag, and if continuous lag >5 s with unchanged stack, treat it as an ANR. No extra delayed tasks are created, resulting in minimal overhead.

Detection effect: injecting 500 ms and 5 s sleeps in a CardView click handler produces the expected lag reports, confirming the approach works.

Active problem discovery: build monitoring dashboards for key metrics (5 s lag, request failures, error toasts, etc.) and a self‑built opinion‑tracking platform for real‑time logs.

Log retrieval platform: supports user‑ID query by issue name, batch retrieval, and integrates with TLog and OSS for comprehensive log access.

Summary and outlook: technical opinion ratio dropped from 10.5 % to 4.7 %, upload success improved, and future work includes log visualization, semantic description, and intelligent parsing.

mobile developmentAndroidPerformance Monitoringlog managementANR detection
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.