Big Data 21 min read

Standardizing Event Tracking (埋点) at Bilibili: Practices, Challenges, and Applications

This article explains Bilibili's comprehensive approach to event‑tracking (埋点) standardization, covering the definition, data pipeline, common business issues, metadata‑driven design strategies, efficiency gains in accuracy, storage and querying, and future directions for automated data flow.

DataFunTalk
DataFunTalk
DataFunTalk
Standardizing Event Tracking (埋点) at Bilibili: Practices, Challenges, and Applications

The article introduces the close relationship between user behavior data management and analysis, emphasizing that standardized event‑tracking data (埋点) is a prerequisite for product‑level applications. It outlines the background of event‑tracking, defining it as logging user actions such as button clicks, and describes its structure (who, when, where, what, how) across client and server sides.

A detailed data pipeline is presented, showing how Bilibili collects, processes, and consumes event data through SDKs for iOS, Android, Web, and back‑load mechanisms, followed by ETL, real‑time streams, offline warehouses (HDFS, Parquet), and query engines like ClickHouse, Presto, and Hive, with visual tools for analysts and operators.

The article then lists common business problems in event‑tracking design and consumption, including inconsistent naming conventions, mismatched parameter mappings, cross‑team coordination, and scalability issues when using spreadsheets for metadata.

To address these challenges, Bilibili proposes a lifecycle‑wide standardization strategy focused on metadata management. It introduces the SPMID model for naming (business_id.page_id.module.position.type), a three‑layer attribute scheme (global fields, type‑common fields, private fields), and tooling (the “North Star” management platform) that automates event creation, attribute definition, sampling configuration, and testing.

Process and governance are organized into six stages involving business, data product, development, testing, data collection, and validation, with clear roles for each participant.

Based on standardized metadata, Bilibili achieves three efficiency improvements: more accurate reporting through automated testing and sampling rules; reduced storage cost by partitioning tables and applying tiered retention based on event importance; and easier querying via front‑end UI that generates SQL against the metadata‑driven warehouse.

Future work aims to unify real‑time and batch streams using the standardized metadata, enabling automated routing, view‑level consumption, and consistent metrics for recommendation algorithms.

The article concludes with a Q&A covering compatibility of old and new data, commercialization plans, sampling calculations, synchronization of tracking with product releases, and the practicality of SPMID design, highlighting the tools that simplify the process.

Analyticsbig datadata pipelineStandardizationevent trackingmetadata managementBilibili
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.