Big Data 19 min read

Building Bilibili's Data Service Middle Platform: Architecture, Practices, and Future Plans

This article presents Bilibili's data service middle platform, detailing its background, pain points of traditional data acquisition, the one‑stop service architecture, core processes, model and API construction, query mechanisms, practical solutions for full‑link control, cost reduction, efficiency gains, high‑availability design, and future roadmap.

DataFunSummit

Jan 29, 2024

Building Bilibili's Data Service Middle Platform: Architecture, Practices, and Future Plans

Introduction – The article shares the construction practice of Bilibili's data service middle platform, presented by senior developer Meng Shuaishuai.

1. Background and Pain Points – Two case studies illustrate the high cost and governance difficulties of traditional data acquisition: long communication chains, duplicated model building, unclear data lineage, and low delivery efficiency.

2. One‑Stop Data Service Platform – The platform aims to achieve unified definition, unified production, and unified consumption of data, reducing cost and improving efficiency.

Unified definition: establish data standards and metric systems to standardize business understanding of data.

Unified production: automate and semi‑automate data processing to clarify lineage and avoid duplicate work.

Unified consumption: provide a common data service gateway for consistent data access.

3. Platform Framework – Built on top of the data warehouse, the platform consists of data construction layer, data query layer, service interface layer, service gateway, and product systems (metric management, model management, self‑service analysis, etc.).

4. Core Processes

Data developers (producers) follow four steps: metric definition, data model construction, data acceleration, and API publishing. Business developers (consumers) retrieve ready‑made APIs from the API market or request new data development when needed.

5. Model Construction – Supports various modeling styles (single, star, snowflake, constellation) and two acceleration methods: detail acceleration (cold‑to‑hot mirroring) and pre‑calculation acceleration (aggregating to target granularity).

Recommended engine combinations:

Online: pre‑calculation + KV store.

Near‑online: pre‑calculation + TiDB/MySQL.

OLAP: detail + ClickHouse/Iceberg.

Offline: direct Hive access.

6. API Construction – Two approaches: visual model‑based configuration and metric‑dimension based configuration, both generating APIs without manual SQL coding.

7. Data Query – Five‑step pipeline: DSL parsing, task splitting, result processing, translation to engine‑specific SQL via a two‑layer AST, and execution on multiple engines (KV, TiDB, MySQL, ClickHouse, Iceberg) with connection pooling and fault tolerance.

8. Practice Solutions

Full‑link control: unified definition, unified production, unified export, and comprehensive monitoring of metric consistency, data warehouse consistency, and service quality.

Cost reduction: standardize models and metrics to avoid duplicate construction, enable API reuse, and shorten development cycles from a week to about a day.

Efficiency improvement: automated metadata management, semi‑automatic model building, and reusable APIs accelerate both data construction and consumption.

High‑availability: service isolation into independent resource groups, storage isolation at API level, and active‑active multi‑region deployment for disaster recovery.

9. Achievements & Future Planning

The platform now offers over 600 APIs, supports online, near‑online, OLAP, and real‑time scenarios, and powers BI tools, self‑service analytics, and product dashboards. Development cycles have been reduced to ~1 day.

Future directions include service governance, expanding to more scenarios (tag platforms, AB testing), service orchestration for on‑demand data, enhanced downgrade and disaster recovery, and continued cost‑efficiency improvements in compute and storage.

Conclusion – The sharing concludes with thanks to the audience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

API Data Governance data services

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.