From Crash Logs to Data-Driven Debugging: Mobile Client Development Practices and Performance Monitoring
The article shares practical experiences from large‑scale mobile client development, covering user crash scenarios, non‑intrusive data collection, AOP‑based interaction and network monitoring, performance instrumentation, and data aggregation techniques to improve debugging efficiency and operational reliability.
The author, Zhang Zitian, a client technology director at Qunar, presents a talk originally delivered at Ctrip's technical salon on mobile development engineering practice and performance optimization.
In large‑scale client development, engineers often encounter obscure issues that are hard to detect and resolve; the article emphasizes the need to free engineers from repetitive debugging by improving observability.
1. User Scenario – A common situation is a user experiencing a crash with minimal information (crash time, page, and reason). Traditional approaches rely on crash logs and manual coordination, which are inefficient.
2. Gaining New Life – By reconstructing the user's interaction trace using interaction logs, performance monitoring, and exception monitoring from three independent systems, engineers can pinpoint the root cause with only one or two people.
Technical Details
The solution includes three data sources: un‑instrumented (no‑tag) statistics, performance monitoring, and exception monitoring.
Key challenges addressed are:
Collecting interaction logs
Network monitoring
Correlating multi‑dimensional data
QAV – No‑Tag Interaction Statistics Platform – To avoid interfering with business code, the team uses AOP to inject data collection. They moved from view‑id based identifiers to coordinate‑based identifiers, and finally to XPath, which provides a robust cross‑platform unique control identifier.
For Android, they employ compile‑time AOP using a Gradle plugin and Java‑agent techniques to hook the JVM, inserting custom tasks before and after the dex transformation to load and unload the agent.
3. Performance Monitoring – Similar AOP techniques are applied to capture network request data. On iOS, runtime hooking is straightforward; on Android, compile‑time AOP is preferred due to dex limitations.
4. Data Aggregation – Interaction and network logs are compressed and uploaded with timestamps; exception logs are uploaded in real time. A requestId links user actions to network requests, and UUIDs ensure end‑to‑end traceability. Time synchronization aligns client timestamps with server time for consistent ordering.
By reconstructing the user's timeline, the approach dramatically reduces debugging time, communication overhead, and improves overall efficiency.
The article concludes by noting that this debugging system is part of a larger “Jindou Cloud” platform that supports the entire app lifecycle from development to operation, emphasizing component‑level reuse and system integration.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.