Big Data 10 min read

Design and Implementation of Bilibili's Big Data Development Governance Platform

This article details Bilibili's five‑year development of a comprehensive big‑data governance platform, covering its usage scenarios, product positioning, data map and governance solutions, abstract configuration approach, operational mechanisms, and future plans, highlighting significant improvements in data efficiency and value assessment.

DataFunTalk
DataFunTalk
DataFunTalk
Design and Implementation of Bilibili's Big Data Development Governance Platform

Bilibili, a data‑driven company, has built a big‑data development governance platform over five years, encompassing data integration, development, governance, security, and analysis modules that serve all business departments.

The platform serves 60% of employees, primarily technical developers, product managers, operations, algorithm engineers, analysts, and data developers, and users are segmented into high‑level developers, mid‑level users, and data novices.

Product positioning focuses on four principles: professional (meeting advanced data development and analysis needs), low threshold (supporting easy data creation, usage, and retrieval), standardized (providing flexible yet generic functionality across business lines), and closed‑loop (covering the entire data lifecycle from ingestion to governance).

The data map product acts as a metadata portal offering search, detail view, preview, lineage, and management features, organized into an eight‑matrix covering data discovery, usage, understanding, governance, and promotion.

Data operation methods are structured as point (1v1 standardization), line (periodic training and interviews), and surface (systematic issue recording and feedback), forming a comprehensive data operation framework.

To assess data value, a ROI‑based model evaluation system was introduced, scoring models on query heat, ETL/API usage, BI report popularity, and other factors, enabling data recommendation, development value assessment, and governance strategies.

For data governance, an abstract configuration approach abstracts assets as metadata objects with configurable properties and operations, allowing rapid creation of governance tasks, automated issue generation, and streamlined processing workflows.

Governance tools have deployed 62 strategies with an average development and launch time of 2‑3 hours, generating over 80,000 issues, processing more than 20,000, saving over 500 w in governance costs and 100+ person‑days.

Future work emphasizes process‑driven management, online SOPs, and automation to reduce governance latency and improve observability, aiming to further enhance development efficiency and data value realization.

product designdata governancedata valueBilibilibig data platformdata operations
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.