Building Bilibili's Data Service Middle Platform: Architecture, Practices, and Future Plans
This article presents Bilibili's data service middle platform, detailing its background, pain points of traditional data acquisition, the one‑stop service architecture, core processes, model and API construction, query mechanisms, practical solutions for full‑link control, cost reduction, efficiency gains, high‑availability design, and future roadmap.
Introduction – The article shares the construction practice of Bilibili's data service middle platform, presented by senior developer Meng Shuaishuai.
1. Background and Pain Points – Two case studies illustrate the high cost and governance difficulties of traditional data acquisition: long communication chains, duplicated model building, unclear data lineage, and low delivery efficiency.
2. One‑Stop Data Service Platform – The platform aims to achieve unified definition, unified production, and unified consumption of data, reducing cost and improving efficiency.
Unified definition: establish data standards and metric systems to standardize business understanding of data.
Unified production: automate and semi‑automate data processing to clarify lineage and avoid duplicate work.
Unified consumption: provide a common data service gateway for consistent data access.
3. Platform Framework – Built on top of the data warehouse, the platform consists of data construction layer, data query layer, service interface layer, service gateway, and product systems (metric management, model management, self‑service analysis, etc.).
4. Core Processes
Data developers (producers) follow four steps: metric definition, data model construction, data acceleration, and API publishing. Business developers (consumers) retrieve ready‑made APIs from the API market or request new data development when needed.
5. Model Construction – Supports various modeling styles (single, star, snowflake, constellation) and two acceleration methods: detail acceleration (cold‑to‑hot mirroring) and pre‑calculation acceleration (aggregating to target granularity).
Recommended engine combinations:
Online: pre‑calculation + KV store.
Near‑online: pre‑calculation + TiDB/MySQL.
OLAP: detail + ClickHouse/Iceberg.
Offline: direct Hive access.
6. API Construction – Two approaches: visual model‑based configuration and metric‑dimension based configuration, both generating APIs without manual SQL coding.
7. Data Query – Five‑step pipeline: DSL parsing, task splitting, result processing, translation to engine‑specific SQL via a two‑layer AST, and execution on multiple engines (KV, TiDB, MySQL, ClickHouse, Iceberg) with connection pooling and fault tolerance.
8. Practice Solutions
Full‑link control: unified definition, unified production, unified export, and comprehensive monitoring of metric consistency, data warehouse consistency, and service quality.
Cost reduction: standardize models and metrics to avoid duplicate construction, enable API reuse, and shorten development cycles from a week to about a day.
Efficiency improvement: automated metadata management, semi‑automatic model building, and reusable APIs accelerate both data construction and consumption.
High‑availability: service isolation into independent resource groups, storage isolation at API level, and active‑active multi‑region deployment for disaster recovery.
9. Achievements & Future Planning
The platform now offers over 600 APIs, supports online, near‑online, OLAP, and real‑time scenarios, and powers BI tools, self‑service analytics, and product dashboards. Development cycles have been reduced to ~1 day.
Future directions include service governance, expanding to more scenarios (tag platforms, AB testing), service orchestration for on‑demand data, enhanced downgrade and disaster recovery, and continued cost‑efficiency improvements in compute and storage.
Conclusion – The sharing concludes with thanks to the audience.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.