Big Data 21 min read

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

This article details Bilibili's comprehensive approach to standardizing event tracking (埋点), covering its definition, data pipeline, common business issues, metadata‑driven management strategies, efficiency gains, and future prospects for unified real‑time and batch processing.

DataFunSummit
DataFunSummit
DataFunSummit
Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

The article introduces the close relationship between user behavior data management and analysis, emphasizing the need for standardized event tracking (埋点) to enable reliable product‑level insights.

1. Definition and Purpose of Event Tracking – An event is recorded when a user interacts with an app or web page, encapsulating who, when, where, what, and how. Event data supports daily active user metrics, recommendation algorithm tuning, and other analytical scenarios.

2. Event Data Pipeline – Describes Bilibili's end‑to‑end flow: SDK integration on iOS, Android, Web, and server side; real‑time streams via Kafka; offline processing through ETL to ODS/DWD/ADS layers; storage on HDFS/Parquet; query engines such as ClickHouse, Presto, Hive; and visualization tools for product managers and analysts.

3. Common Business Problems

Inconsistent naming conventions across teams.

Parameter mapping inconsistencies.

Misaligned information among product, development, testing, and operations.

Scalability issues when managing event definitions with spreadsheets.

Difficulty locating event IDs, selecting tables, and handling permissions during analysis.

4. Standardization Practices – Bilibili adopts a lifecycle‑wide management model centered on metadata. It iterates through three stages: custom per‑business events, event‑model abstraction, and unified SP‑MID (super‑model) naming with five components (business, page, module, position, type). The strategy includes naming standards, attribute management (global, type‑specific, private), tool support (the “North Star” management platform), and defined processes involving product, data, development, and testing stakeholders.

5. Efficiency Gains from Metadata‑Driven Standardization

More accurate reporting through automated testing and DQC rules.

Reduced storage costs via partitioned tables, tiered retention, and sampling based on event importance.

Simplified querying with front‑end UI, pre‑partitioned SQL, and BI visualizations.

6. Future Outlook – Explores unified stream‑batch processing using standardized metadata, enabling one‑click routing for both real‑time and offline consumption, and facilitating downstream services such as recommendation algorithms.

The article concludes with a Q&A addressing compatibility, ToB commercialization, sampling calculations, release synchronization, and tooling ergonomics.

analyticsBig Dataevent trackingETLmetadata managementData StandardizationBilibili
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.