Didi's Big Data Technology and Architecture Overview
The article presents a detailed overview of Didi's massive traffic data processing, its big‑data technology stack—including Hadoop, Hive, Flink, Spark, HBase, Presto and ClickHouse—intelligent data catalog, lean data production practices, and impressive operational metrics that illustrate how the company achieves high‑performance, real‑time analytics at scale.
Author: Didi Chief Engineer Zhang Maosen shares a comprehensive overview of the company's big data technology and architecture.
Scenario: Didi processes over 4800 TB of traffic data daily, with more than 150 billion vehicle location records and 400 billion route‑planning requests, achieving 85 % accuracy in 15‑minute supply‑demand forecasts.
Key drivers: business digitization, information data‑fication, data assetization, and asset monetization.
Core challenges: handling multi‑scenario, end‑to‑end complex product requirements and coordinating many teams with diverse goals.
Data platform components: offline processing with Hadoop and Hive, real‑time computation with Flink and Spark, OLAP with HBase, Presto and ClickHouse, plus an intelligent data catalog offering unified metadata search, value‑based ranking, and crowdsourced knowledge.
Lean data production: focus on data quality, stability, full‑link SLA, event tracking, rapid data collection, operational monitoring, and a 90 % review rate, resulting in reduced incidents and higher NPS.
Achievements in two years include external data‑infrastructure output, over 150 data‑culture improvements, DataRank asset score increase, D0‑level incident reduction from >10 to 1, 20 % staff usage of DataGraph, and NPS growth from 19 % to 60 %.
Real‑time data integration service statistics: ~300 collection clusters, >4 500 data sources, 27 000 agents, peak ingestion 29 million rows/s, 20 million daily queries, average response <1 s, and 99.996 % stability.
Overall, Didi's data middle‑platform demonstrates a scalable, high‑performance architecture that supports massive traffic analytics, rapid feature delivery, and continuous data‑driven optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
