Data Serviceization at Didi: Architecture, Phases, and Standard Metric Service
Didi’s data serviceization converts raw business data into consumable services through a four‑stage pipeline—integration, development, production, and back‑flow—while the Data Dream Factory and Shu‑Chain platform automate synchronization, provide a unified access gateway for thousands of APIs, and introduce a standard metric service that abstracts storage complexities and ensures high‑performance, secure data delivery.
The article introduces the concept of data serviceization, describing how Didi transforms raw business data into consumable services through a four‑stage data development process: data integration, data development, data production, and data back‑flow.
Data integration connects business systems to the big‑data environment via periodic offline imports, real‑time cleaning and ingestion, or direct real‑time writes. Didi’s internal Synchronization Center already supports sources such as MySQL, Oracle, MongoDB, and Publiclog.
During data development/production, users can build both real‑time and offline data warehouses using SQL, native scripts, or shell tasks. Data back‑flow exports processed data to OLAP or RDBMS stores (e.g., MySQL, ClickHouse, Druid, StarRocks) to improve query performance for downstream services.
Didi’s “Data Dream Factory” provides a one‑stop solution for data development and production, focusing on efficiency, security, and stability.
To deliver data to users, Didi built a unified data consumption platform that includes generic products such as Shuyi, a data‑intelligent Q&A bot, anomaly analysis, and domain‑specific products like Polaris and the Group Exhibition Hall. The platform queries structured, standardized data and routes queries to appropriate multi‑dimensional storage engines (MySQL, ClickHouse, Druid, StarRocks) based on performance requirements.
The roadmap is divided into three phases:
Phase 1 – Build Synchronization Center Back‑flow Capability : Automate data sync tasks, eliminate manual ticket‑based sync, and enable Hive‑to‑MySQL/ClickHouse/Druid/HBase/ES back‑flow. This reduced query P90 latency for Shuyi from 5 s to under 2 s.
Phase 2 – Build the “Shu‑Chain” Platform for Unified Data Service : Provide a unified access gateway, standard data access protocol, and API management. Support diverse data sources (ES, MySQL, ClickHouse, HBase, Druid) and high‑concurrency key/value queries, multi‑dimensional analysis, and data export. The platform accelerated API creation from days to minutes, now serving >4,000 APIs and >200 applications.
Phase 3 – Build Standard Metric Service : Extend metadata management to express indicators, dimensions, and logical models. Introduce derived, computed, and composite metrics; support four dimension types (dimension tables, enumerations, degraded dimensions, derived dimensions). Logical models bind metrics and dimensions to physical tables (Hive, StarRocks, ClickHouse) and define storage engine, layout, and data‑warehouse layer (APP, DM, DWS, DWD).
Automation of query logic enables users to request data by specifying metrics and dimensions only; the system selects the optimal logical model, handles federated queries, and abstracts away data layout complexities.
Consistency verification is achieved through passive checks (user‑configured metric validation) and automatic checks (system‑generated model decomposition). The platform also integrates a unified query middleware, DiQuery, which leverages MPP capabilities, supports federated queries, LOD functions, and advanced time‑window calculations.
Overall, the Data Dream Factory and Shu‑Chain platform aim to decouple data production from consumption, provide a secure, stable, and high‑performance data service ecosystem, and lay the groundwork for future standard metric services and further data‑quality improvements.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
