Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions
This article details Bilibili's comprehensive approach to standardizing event tracking (埋点), covering its definition, data pipeline, common business issues, metadata‑driven management strategies, efficiency gains, and future prospects for unified real‑time and batch processing.
The article introduces the close relationship between user behavior data management and analysis, emphasizing the need for standardized event tracking (埋点) to enable reliable product‑level insights.
1. Definition and Purpose of Event Tracking – An event is recorded when a user interacts with an app or web page, encapsulating who, when, where, what, and how. Event data supports daily active user metrics, recommendation algorithm tuning, and other analytical scenarios.
2. Event Data Pipeline – Describes Bilibili's end‑to‑end flow: SDK integration on iOS, Android, Web, and server side; real‑time streams via Kafka; offline processing through ETL to ODS/DWD/ADS layers; storage on HDFS/Parquet; query engines such as ClickHouse, Presto, Hive; and visualization tools for product managers and analysts.
3. Common Business Problems
Inconsistent naming conventions across teams.
Parameter mapping inconsistencies.
Misaligned information among product, development, testing, and operations.
Scalability issues when managing event definitions with spreadsheets.
Difficulty locating event IDs, selecting tables, and handling permissions during analysis.
4. Standardization Practices – Bilibili adopts a lifecycle‑wide management model centered on metadata. It iterates through three stages: custom per‑business events, event‑model abstraction, and unified SP‑MID (super‑model) naming with five components (business, page, module, position, type). The strategy includes naming standards, attribute management (global, type‑specific, private), tool support (the “North Star” management platform), and defined processes involving product, data, development, and testing stakeholders.
5. Efficiency Gains from Metadata‑Driven Standardization
More accurate reporting through automated testing and DQC rules.
Reduced storage costs via partitioned tables, tiered retention, and sampling based on event importance.
Simplified querying with front‑end UI, pre‑partitioned SQL, and BI visualizations.
6. Future Outlook – Explores unified stream‑batch processing using standardized metadata, enabling one‑click routing for both real‑time and offline consumption, and facilitating downstream services such as recommendation algorithms.
The article concludes with a Q&A addressing compatibility, ToB commercialization, sampling calculations, release synchronization, and tooling ergonomics.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.