Xianyu Android Log Governance and Performance Monitoring Solution
The Xianyu Android Log Governance solution unifies local console logs, TLog and online logs, adds comprehensive logcat capture and AUS‑OSS retrieval, replaces BlockCanary with low‑overhead frame‑callback lag/ANR detection, and provides dashboards and a batch‑query platform, cutting technical‑opinion ratio from 10.5 % to 4.7 % and boosting upload success.
Current situation: Xianyu relies on Alibaba infrastructure for log collection, with capabilities such as crash aggregation, local TLog, online event and user‑behavior logs. However many issues remain: missing crash/ANR logs for crashes, difficulty locating business issues, incomplete log content, limited local log capability, low command‑push success, and low feedback coverage.
Overall governance design: a new system integrates local console logs, TLog, online logs, and a retrieval strategy to close the gaps.
Enhancing local log capability: capture Android logcat logs (LOG_ID_MAIN, LOG_ID_EVENTS, LOG_ID_CRASH) via adb commands and package them with AUS to OSS. Example commands:
adb logcat -d -v threadtime -t 20000
adb logcat -d -b events -v threadtime -t 6666
adb logcat -d -b crash -v threadtime -t 6666Local log retrieval: upload via AUS/TLog, improve success rate, and provide a batch retrieval platform for difficult‑to‑obtain logs.
Online lag/ANR detection: existing BlockCanary and adb bugreport have limitations in production. Permission issues prevent reading /data/anr/traces.txt on Android 6+. Sample code:
File mSystemTraceFile;
this.mSystemTraceFilePath = "/data/anr/traces.txt";
this.mSystemTraceFile = new File(this.mSystemTraceFilePath);
if (!this.mSystemTraceFile.exists()) {
String propSystemTraceFilePath = SystemPropertiesUtils.get("dalvik.vm.stack-trace-file");
this.mSystemTraceFile = new File(propSystemTraceFilePath);
}BlockCanary principle: sets Looper.mLogging to monitor UI thread tasks. Sample:
public void start() {
if (!mMonitorStarted) {
mMonitorStarted = true;
Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
}
}New solution: replace Looper task monitoring with Android frame callbacks. Record timestamps on each frame; if the interval exceeds 500 ms, treat it as a lag, and if continuous lag >5 s with unchanged stack, treat it as an ANR. No extra delayed tasks are created, resulting in minimal overhead.
Detection effect: injecting 500 ms and 5 s sleeps in a CardView click handler produces the expected lag reports, confirming the approach works.
Active problem discovery: build monitoring dashboards for key metrics (5 s lag, request failures, error toasts, etc.) and a self‑built opinion‑tracking platform for real‑time logs.
Log retrieval platform: supports user‑ID query by issue name, batch retrieval, and integrates with TLog and OSS for comprehensive log access.
Summary and outlook: technical opinion ratio dropped from 10.5 % to 4.7 %, upload success improved, and future work includes log visualization, semantic description, and intelligent parsing.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.