Design of the DeWu App ANR Monitoring Platform
The DeWu ANR Monitoring Platform replaces Bugly with a comprehensive internal system that collects ProcessErrorStateInfo, tombstone files, main‑thread stack samples, Looper message histories, and app state, parses and aggregates this data, visualizes trends and root‑cause details, and guides developers in mitigating ANRs.
DeWu previously relied on the Bugly platform for ANR collection, but the information granularity and log aggregation did not meet their needs. This article describes the design and implementation of an internal ANR monitoring platform.
ANR data collection includes ProcessErrorStateInfo, tombstone files, main‑thread stack samples, Looper message histories, and app state information (foreground/background, uptime, etc.).
ProcessErrorStateInfo parsing extracts fields such as processName, pid, uid, tag, shortMsg, and longMsg. From longMsg regular expressions are used to identify the component where the ANR occurred and the trigger reason.
Tombstone parsing leverages the open‑source xcrash library. Parsed data covers crash metadata (type, timestamps, version, ABI, pid/tid), VM GC info, thread snapshots (stack, priority, state, locks), logcat output, open files, and memory usage.
Looper Message trace classifies messages into processed, currently processing, and pending. For processed messages, both wall‑time and CPU‑time are recorded; a large wall‑time / CPU‑time gap indicates possible thread‑scheduling contention. Pending messages help assess scheduling delays and message backlog.
Method trace sampling samples the main‑thread stack every 50 ms, retaining 10–20 s of data. Samples are converted into a MethodNode tree, aggregating execution time per method, filtering known white‑list methods, and flagging functions exceeding a configurable threshold.
Example code used in the trace analysis: public class MethodNode { private List children = new ArrayList<>(); // method cost time private int cost; private String fullMethodName; private JavaStackFrameSnapshot javaStackFrameSnapshot; }
ANR log aggregation groups individual ANR reports into issues based on component and cause, enabling trend analysis and status tracking.
Platform UI presents an issue list with trends, detailed issue pages showing aggregated logs, flame charts, Looper message traces, CPU usage breakdowns, and logcat snippets. Visualizations help developers pinpoint root causes.
Case studies include Kotlin function toString latency, main‑thread View.getDrawingCache causing blocking, and SP write waits leading to ANRs. Mitigations such as avoiding heavy main‑thread work, moving I/O off the UI thread, and replacing SP with MMKV are discussed.
Future directions aim to enrich context collection (e.g., native file‑IO monitoring), reduce method‑trace overhead, integrate message‑trace with method‑trace snapshots, and clearly separate pre‑ANR and post‑ANR data.
References to related research and open‑source projects are listed at the end.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.