How Bilibili Turned Big Data Governance from Reactive to Proactive
This article details Bilibili's journey from a late‑started, reactive big‑data platform to a mature, proactive governance system that combines asset metadata, metric‑driven strategies, cost‑aware billing, and automated tooling to achieve massive storage savings and operational efficiency across the organization.
Background and Motivation
Bilibili launched its internet business in 2009, but only formed a big‑data team in 2017 and began large‑scale construction in 2019. Rapid growth caused explosive data growth, reaching exabyte‑scale by 2023, making server procurement unsustainable and highlighting the need for systematic data governance.
Key Challenges
Multiple business lines managing data without a unified framework.
Mix of business and technical data.
Lack of asset ownership leading to orphaned data.
No unified data reporting standards.
To address these, the "Wanglou" project (named after an ancient watchtower) was launched in late 2021.
Project Goals and Initiation
The project aimed to answer two core questions: (1) How to start governance? (2) How to get users to change habits and participate?
Governance Framework
The approach begins with building an asset metadata model and then a governance metric system . Initially, a bottom‑up method identifies key assets (Hive tables, scheduling jobs) and maps their lifecycle processes. Once a baseline is established, a top‑down method uses the metric system to drive strategy execution.
The metric system links governance goals, strategies, and evaluation:
Governance Goal : Define target metrics (e.g., a North Star metric).
Governance Strategy : Break down goals into actionable directions and tasks.
Strategy Evaluation : Measure whether tasks hit targets and assess effectiveness.
Operational flow follows Metric → Issue → Standard → Implementation → Metric loops.
Choosing the First Focus: Storage Governance
Storage was selected because storage watermarks regularly exceeded 90%, causing emergency clean‑ups, and storage cost dominated the big‑data budget.
Stage‑1 target: Reduce overall storage usage by 50% within 2022 while keeping growth under control. Strategies were prioritized by low implementation cost and high impact.
Typical issues identified from top‑storage consumers:
Downstream unused data.
Excessively long TTL.
Uncompressed data.
Corresponding standards and actions were defined (e.g., decommission unused data, shorten TTL, enforce compression).
Cost‑Aware Billing and Governance Scores
A three‑level billing model (department → space → individual) was introduced to make users aware of data‑related costs. Billing follows usage × unit price, requiring reliable data sources for each component (HDFS, YARN, Kafka, ClickHouse, etc.).
Governance scores translate compliance into a numeric rating, encouraging departments to follow best practices. Scores are updated each cycle and tied to a semi‑annual data‑governance award.
Tooling and Automation
The Governance Center aggregates issue detection, strategy explanation, alerts, execution, and benefit reporting in a single UI, enabling users to receive personalized task lists and track progress.
Automation reduces manual effort, while classification and prioritization guide users on which issues to address first.
Results
By the end of 2022, storage resources were reduced by 55%, and the year‑over‑year growth rate dropped from 226% to 34%, saving billions of yuan in budget. The water‑mark stabilized around 75%.
From Reactive to Proactive Governance
Proactive measures include long‑term goal planning, predictive water‑level alerts with tiered response, and organization‑wide communication via a Data Committee that sets standards and policies across resources (storage, compute, traffic).
Quota limits were introduced for storage, compute, and traffic, automatically throttling usage once thresholds are hit.
Multi‑Dimensional Governance
A unified tag model, quota policies, and SOPs for safe data deletion enable scaling governance across many asset types without increasing labor.
Future Roadmap
Shift focus from pure cost to cost‑plus‑value by building a data‑value assessment framework.
Expand governance from fragmented to integrated, covering all asset types.
Adopt lake‑house and one‑service architectures to improve data and metric reuse.
Overall, Bilibili’s "Wanglou" initiative demonstrates how systematic metadata, metric‑driven governance, cost transparency, and automation can transform big‑data operations from crisis‑driven firefighting to sustainable, value‑oriented management.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
