Design Insights of Bilibili's Big Data Development Governance Platform
This article outlines Bilibili's five‑year journey building a comprehensive big‑data development and governance platform, detailing its user segmentation, product positioning, data map and governance product design, abstract configuration approach, operational mechanisms, value assessment, and measurable impact on data efficiency and business outcomes.
Introduction: Bilibili is a data‑driven company where 60% of employees use data daily; the data platform directly impacts work efficiency. This article shares design insights of Bilibili’s big‑data development governance platform.
01. Bilibili’s data usage scenarios and platform overview
The platform, built over five years, includes modules for data integration, development, governance, security, and analysis, serving all business units. Users cover 60% of staff, grouped into high‑level developers, intermediate users, and data novices.
Product positioning based on user segmentation: professional, low‑threshold, standardized, and closed‑loop.
Professional: meet professional data development and analysis needs, improve data supply efficiency.
Low‑threshold: enable easy data creation, usage, and retrieval for production and operations users.
Standardized: provide flexible yet generic functions to satisfy diverse business demands.
Closed‑loop: support full lifecycle from data ingestion to production, operation, and governance.
02. Data map product built on a value system
Data map is a metadata‑driven portal offering search, detail, preview, lineage, and management features, reducing data discovery effort.
Eight product matrices—insight recommendation, full‑text search, category system, data portrait, UGC & API, data album, lineage, impact analysis—address find‑data, use‑data, understanding, governance, and promotion scenarios.
Key pain points of data operation include scaling model base, difficulty in data retrieval, and phased exposure of issues.
Solutions: enhance product functions, build data operation system, and develop model evaluation capabilities.
Data value evaluation incorporates query heat, ETL usage, API calls, BI report popularity, and other factors, assigning weighted scores to guide recommendations and governance.
Product effects: data map penetration rose from 30% to 60%, table recommendation heat up 40%, user satisfaction up 33%, and top value scores increased 20%.
03. Abstract‑configuration based data governance product
Rapid growth of tables and tasks necessitated a more efficient governance approach. The solution provides B‑side productized tools with abstract configuration to address governance entry difficulty, unsustainable operations, lack of visualization, unclear responsibilities, and high costs.
Governance framework covers data cost, standards, quality, and security, built on metadata with configurable attributes and dynamic parameters (e.g., {jobid}).
Abstract operations enable reuse of governance actions via configurable URLs and parameters, accelerating deployment.
A unified issue generation and handling workflow creates pending tasks automatically, scans documents daily, and pushes governance tasks to users.
Results: 62 governance strategies launched, each taking 2‑3 hours to develop and deploy; over 80k issues generated, 20k resolved, saving >500w in cost and over 100 person‑days.
For further reading, see the linked resources on big‑data learning, Flink, Spark, ClickHouse, and data governance methodologies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
