Big Data 21 min read

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

This article details Bilibili's comprehensive approach to standardizing event tracking (埋点), covering its definition, data pipeline, common business issues, metadata‑driven management strategies, efficiency gains, and future prospects for unified real‑time and batch processing.

DataFunSummit

Dec 1, 2023

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

The article introduces the close relationship between user behavior data management and analysis, emphasizing the need for standardized event tracking (埋点) to enable reliable product‑level insights.

1. Definition and Purpose of Event Tracking – An event is recorded when a user interacts with an app or web page, encapsulating who, when, where, what, and how. Event data supports daily active user metrics, recommendation algorithm tuning, and other analytical scenarios.

2. Event Data Pipeline – Describes Bilibili's end‑to‑end flow: SDK integration on iOS, Android, Web, and server side; real‑time streams via Kafka; offline processing through ETL to ODS/DWD/ADS layers; storage on HDFS/Parquet; query engines such as ClickHouse, Presto, Hive; and visualization tools for product managers and analysts.

3. Common Business Problems

Inconsistent naming conventions across teams.

Parameter mapping inconsistencies.

Misaligned information among product, development, testing, and operations.

Scalability issues when managing event definitions with spreadsheets.

Difficulty locating event IDs, selecting tables, and handling permissions during analysis.

4. Standardization Practices – Bilibili adopts a lifecycle‑wide management model centered on metadata. It iterates through three stages: custom per‑business events, event‑model abstraction, and unified SP‑MID (super‑model) naming with five components (business, page, module, position, type). The strategy includes naming standards, attribute management (global, type‑specific, private), tool support (the “North Star” management platform), and defined processes involving product, data, development, and testing stakeholders.

5. Efficiency Gains from Metadata‑Driven Standardization

More accurate reporting through automated testing and DQC rules.

Reduced storage costs via partitioned tables, tiered retention, and sampling based on event importance.

Simplified querying with front‑end UI, pre‑partitioned SQL, and BI visualizations.

6. Future Outlook – Explores unified stream‑batch processing using standardized metadata, enabling one‑click routing for both real‑time and offline consumption, and facilitating downstream services such as recommendation algorithms.

The article concludes with a Q&A addressing compatibility, ToB commercialization, sampling calculations, release synchronization, and tooling ergonomics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Analytics event tracking ETL metadata management Data Standardization Bilibili

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.