Evolution and Design of the Lego Logging System for Mobile Applications
This article describes the four-stage evolution of the Lego client‑side logging system—covering its initial zero‑to‑one architecture, the separation of business and technical logs, real‑time reporting improvements, and the latest architecture redesign that boosts performance, reduces overhead, and provides a safe migration path.
Introduction
Logging (埋点) is critical for mobile apps to drive business growth and technical optimization; the self‑developed Lego logging system at ZhaiZhai has evolved from a single‑function architecture in 2015 to a composite architecture supporting automated collection, real‑time reporting, and separation of business and technical logs.
Lego: From Zero to One
Background
Early mobile development (2015) faced limited CPU, memory, battery, and unstable networks; industry solutions emphasized low‑power, low‑overhead data collection, and the app had few business events.
Architecture Design
To cope with unstable networks, multiple events are merged and written to a local file, reducing request frequency; files are compressed before upload to save bandwidth. Because the main process has limited memory, a separate subprocess handles formatting, file I/O, compression, and network upload, isolating resources from the UI process.
Architecture Characteristics
Independent memory space via subprocess prevents main‑process resource contention.
Process isolation improves stability; a crash in the subprocess does not affect the app.
Low performance overhead: logs are batched every two minutes, reducing request bursts.
Traffic saving: compressed log files are uploaded, then decompressed server‑side for storage and analysis.
Lego4APM: Business Log and Technical Log Separation
Background
Growing user base demands fine‑grained, real‑time data for automated decisions.
Mixed business and APM logs caused latency pressure on the big‑data pipeline.
The client needed a unified, standardized performance‑related logging dimension.
Implementation
A new component, Lego4APM, re‑uses the original Lego architecture but routes technical logs to a dedicated interface, separating them from business events while preserving the existing log‑level aggregation.
LegoRealtime: Improving Real‑time Capability
Background
After separating business and technical logs, the original Lego still merged events and uploaded every two minutes; to support automated operations, core user‑behavior logs must be reported within 200 ms.
Design
The solution leverages mobile‑specific features: data backup for weak or disconnected networks, exception retransmission, simple JSON formatting without extra compression, and gray‑scale control switches for stable rollout.
Key Implementation Points
Backup and disaster recovery: distinguish real‑time logs from exception backups, preferring database backup over file backup.
Exception retransmission: batch resend when network recovers or the app restarts.
Lightweight formatting: upload raw JSON over HTTPS, avoiding additional compression.
Stable controllability: gray‑scale switches and validation interfaces manage migration and data quality.
Migration Plan
Non‑intrusive migration by intercepting the original Lego upload API and routing core events through a whitelist to the real‑time path.
Gradual rollout using AB‑test switches (10 % → 50 % → 100 %) while monitoring data integrity and backend stability.
Validation Results
Average round‑trip latency of the real‑time endpoint is 160 ms (≈80 ms upload latency), well below the 200 ms target.
Real‑time uploads produce ~1 % more data than the previous Lego4APM path, an acceptable increase.
Lego New Architecture: Performance Boost
Background
Issues with the old architecture included privacy concerns from background services, duplicate reporting, missing events, IPC‑induced UI jank, complex multi‑process configuration, and code duplication between Lego and Lego4APM.
New Architecture Design
The redesigned system replaces the subprocess with a HandlerThread (sub‑thread) within the same process. The main process creates a Lego instance for configuration, while the HandlerThread handles file writing, compression, and upload.
Advantages
No IPC overhead – thread‑level memory sharing eliminates binder‑related jank.
Eliminates privacy compliance issues by removing background service IPC.
Supports multiple instances, simplifying maintenance and reducing duplicated code.
Serial execution of log write and upload prevents duplicate or missing events.
Version Migration Strategy
Migration is split into two phases: (1) verification by comparing selected event counts and parameters between old and new versions; (2) staged rollout by user percentage (10 %, 50 %, 100 %) with one‑week gray periods, monitoring core business metrics.
Post‑migration validation shows no data loss and a 1 % increase in event volume.
Conclusion and Outlook
The current Lego system consists of ZPM‑driven automatic business‑event collection reported via LegoRealtime, manual business events via Lego4Buz, and technical APM logs via Lego4APM. The recent optimizations improve reliability, real‑time capability, stability, and maintenance cost. Future work includes exploring lower‑power transmission methods, refining large‑file upload strategies, and enhancing exception handling to avoid cascading failures.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.