Big Data 9 min read

Design and Evolution of Didi's Real‑Time Data Computing Platform

The article details how Didi built and iterated its real‑time data platform, describing the shift from MySQL‑based batch processing to a Kafka‑Samza‑Druid architecture with Spark Streaming and Flink, the challenges addressed, and the current capabilities and operational metrics.

DataFunTalk

Mar 7, 2019

Design and Evolution of Didi's Real‑Time Data Computing Platform

The presentation introduces Zhang Tingting, a senior R&D engineer at Didi, and outlines the sharing agenda: linking time‑series data with Didi's real‑time computing platform, the current technical solution, and the achievable outcomes.

Initially, the platform acted as a "business eye" to detect data changes and trigger alerts. Its early architecture scanned MySQL tables, pre‑computed data, stored results in local files, and periodically uploaded them to a CKV database for querying.

This design quickly hit bottlenecks as data grew: high development and extension effort, exponential cost in computation and storage, and high latency with unstable query performance.

To improve timeliness and stability, the data flow was re‑engineered: HTTP links were replaced by a message queue (Kafka) to increase throughput and guarantee consumption via ACKs, and offline batch processing was swapped for real‑time stream computation.

For metric calculation, the platform adopted a query‑time computation model, storing only lightweight ETL results (order tables) and performing diverse metric queries on top, which reduces resource consumption and improves reuse.

Given the time‑series nature of the data, an OLAP engine was introduced; Druid was selected as the time‑series database, combined with Kafka and Samza for ingestion and processing.

The second‑stage architecture (illustrated in the source) still faced development bottlenecks such as long implementation cycles and fragmented ownership, prompting the creation of the Woater real‑time computing development platform to lower entry barriers and improve asset management.

Woater’s architecture ingests data into Kafka, processes it with Spark Streaming or Flink, writes results to Druid, and optionally performs offline Hive‑Druid jobs for historical data. A unified real‑time data API exposes the data to authorized third parties, with additional modules for permission and lineage management.

Druid provides low‑latency writes and fast interactive queries for time‑series data, using columnar storage, roll‑up pre‑aggregation, bitmap indexes, and time‑based sharding, achieving sub‑second query latency and high compression (≈1/30 of raw size).

Because Druid’s compute capabilities are limited (e.g., lacking complex joins), Spark Streaming and Flink supplement the platform: Spark Streaming handles micro‑batch processing suitable for developers familiar with Spark, while Flink offers true stream processing with exactly‑once semantics for low‑latency scenarios.

The platform supports three development modes: web‑based online development for experienced engineers, StreamSQL + DruidSQL for users with SQL skills, and visual drag‑and‑drop for beginners, along with configuration and resource governance features.

Currently, the platform covers all core Didi business lines, delivering second‑level real‑time monitoring with 99.995% availability, strict SLA‑based alert quality (≤3 false alerts and 1 missed alert per month), and additional functionalities such as integrated dashboards and rich charting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink real-time analytics Kafka Druid Spark Streaming

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.