Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook
This article outlines Bilibili's data quality assurance framework, detailing its evolution across four development stages, the current data platform architecture, identified pain points, four key quality objectives, core capabilities such as a quality data warehouse, comprehensive monitoring, digital optimization, fault handling, and future directions.
Background and Goals : Bilibili's data quality assurance aims to continuously improve data quality, reduce incident correction costs, and enhance business service satisfaction.
Data Construction Evolution : The data platform has progressed through four stages – Database, Data Warehouse, Data Platform, and Middle Platform – each adding volume, complexity, and new technologies such as Hadoop and real‑time pipelines.
Current Architecture : The architecture consists of four layers from data sources (account, event, CRM, third‑party systems) to the data platform, middle platform, and data applications, supporting both PC and mobile dashboards.
Identified Pain Points : Unclear quality scope, insufficient monitoring coverage, high night‑shift alarm rates, and coordination challenges across upstream/downstream teams.
Four Quality Objectives :
Accurately identify core scenarios and enable metric‑driven measurement.
Ensure data meets completeness, accuracy, consistency, and timeliness while supporting customized user needs.
Guarantee end‑to‑end lifecycle coverage (pre‑, during‑, and post‑processing).
Codify methodology and tool capabilities for prevention, response, recovery, and review.
Core Capabilities :
Quality Data Warehouse – unified ingestion, layered storage (detail, summary, aggregate), and dashboards for quality metrics.
Comprehensive Quality Assurance System – monitoring, rule libraries, incident‑attribution knowledge base, and cross‑team SLA coordination.
Digital‑Driven Continuous Optimization – metric definition, current‑state analysis, problem discovery, solution implementation, and impact tracking.
Efficient Fault Handling – night‑shift process, rapid root‑cause analysis, automated recovery, and post‑mortem review.
Case Study : A typical Bilibili data development workflow (task deployment → monitoring → alarm handling → data recovery → post‑mortem) reveals challenges such as low monitoring coverage (<50%), fragmented SOPs, and a night‑shift duty rate of ~50%.
Future Outlook : Expand quality coverage, enrich assurance strategies, advance tool‑support, and transition from manual operations to information‑driven and ultimately intelligent quality assurance.
Q&A Highlights :
Aligning quality rules across tables by tiered grading and focusing on high‑priority online services.
Consolidating cross‑platform real‑time tasks (e.g., Flink) into a unified quality data warehouse for consistent evaluation.
Accelerating incident investigation through an incident knowledge base and automated alarm distribution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
