How Bilibili Scaled Big Data Governance: From Reactive to Proactive
This article details Bilibili's journey from rapid data growth to a structured big‑data governance framework, describing the challenges of fragmented ownership, the "Wanglou" project launch, asset metadata modeling, metric design, user engagement strategies, automation tools, and the shift from reactive to proactive, multi‑dimensional resource management.
Background
Bilibili’s rapid traffic growth caused data volumes to explode, reaching EB‑scale by 2023. The company founded a dedicated data team in 2017 and began large‑scale construction in 2019. Pure hardware scaling became unsustainable, prompting a systematic data‑asset governance effort.
Wanglou Project Overview
Named after ancient watchtowers, the "Wanglou" project aims to build a full‑view of data assets, continuously monitor anomalies, and provide authoritative remediation. Two core questions guided the launch:
How to start governance?
How to get users to change habits and participate?
1. Asset Metadata Construction
The first step was to create a comprehensive metadata catalog. A bottom‑up inventory identified key assets (Hive tables, scheduling jobs) and recorded their lifecycle stages—creation, publishing, authorization, consumption, deprecation. This high‑level catalog enabled problem discovery and asset‑level inventory.
After the inventory, a top‑down approach defined governance metrics that drive strategy execution. The metadata model became the foundation for the governance indicator system.
2. Storage Governance as the Initial Leverage Point
Storage cost was the largest single expense. Historical analysis showed many assets were unused, had excessive TTLs, or were stored uncompressed. The team set a quantitative target: reduce overall storage by 50% within one year while keeping the storage‑water‑level below risk thresholds. Problems were prioritized by the "low‑cost, high‑impact" principle.
Governance Metric System
The system consists of three layers:
Governance goals – north‑star metrics for a given period.
Governance strategies – concrete actions derived from the goals.
Strategy evaluation – implementation metrics (whether a strategy was hit) and effectiveness metrics (benefit of the strategy).
Metrics flow: Goal → Strategy → Evaluation.
User Engagement via Multi‑Level Billing
To make users aware of cost, a three‑level billing model (department → space → individual) was introduced. Billing follows the formula usage × unit_price. Accurate billing required:
Strict registration of new assets with owner information.
Full coverage of asset metadata across all asset types.
Alignment of production tasks and output tables with owner data.
Integration of client‑side authentication with platform permissions.
Initial billing suffered from missing sources and delays; after stabilizing data pipelines, billing became reliable and was used as a behavioral incentive.
Scoring Model and Automation
A scoring model converted governance actions into points, reflecting experience, authority, and incremental improvement. The model guided users toward best practices and highlighted high‑priority actions.
Automation tools performed the following:
Intercept repeatable issues.
Classify problems and assign priority.
Generate guided execution steps with effort estimates.
Quantify expected benefits.
Active Governance Results
By moving from reactive to proactive governance, the team:
Reduced storage water‑level from ~90% to ~75%.
Cut overall storage usage by 55%.
Lowered storage growth rate from 226% to 34%.
Saved billions of RMB in budget.
Multi‑Dimensional Governance Evolution
To expand beyond a single focus, the project introduced:
Unified tag model and tag‑based metric dictionary (7 asset objects, 23 processes, 160 tags).
Standardized control policies such as QUOTA limits for storage, later extended to offline/real‑time compute and traffic.
New technologies (EC, Z‑ORDER) to improve storage and compute efficiency with minimal disruption.
A consistent implementation workflow with rollback capability for destructive operations.
Future Roadmap
Planned directions include:
Shifting from pure cost focus to a cost‑plus‑value assessment by building a data‑value evaluation framework.
Integrating governance across all asset types (metadata, compute, traffic) through the Governance Center.
Adopting lake‑house and one‑service architectures to increase pipeline reuse and metric sharing.
Key Visuals
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
