Big Data 14 min read

Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD

This article reviews the evolution, architecture, and key components of major Chinese big‑data platforms—including those of Taobao, Didi, Meituan, 360, Kuaishou, and JD—highlighting data ingestion, storage, processing engines, scheduling systems, and service‑oriented designs that underpin their large‑scale data operations.

Big Data Technology & Architecture

Jan 15, 2021

Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD

Introduction: The article combines the author's experience with references to big‑data platform constructions at major Chinese internet companies such as Taobao, Didi, Meituan, 360, Kuaishou, and JD, aiming to reveal the composition and development process of a complete big‑data platform.

Taobao : Taobao built one of the earliest Hadoop‑based big‑data platforms, consisting of data sources (Oracle, MySQL, logs, crawlers) synchronized via DataExchange, DBSync, and TimeTunnel into HDFS, processed by the Tianwang scheduling system, and exported to downstream applications.

Didi : Didi’s platform evolved through three stages—self‑built clusters, centralized platform, and SQL‑centric design—using Hadoop 2, Spark, Hive, a custom scheduler, and a visual SQL IDE; it also heavily utilizes HBase and Phoenix for real‑time and offline data storage.

Meituan : Meituan’s platform ingests data from MySQL (via Canal) and logs (via Flume) into Kafka, then processes streams with Storm and batch jobs with Hive, delivering results to HBase, relational databases, BI tools, and a “Tianji” reporting system.

360 (Qirin) : The Qirin platform provides an end‑to‑end big‑data solution covering resource management, metadata, data collection, task development, interactive analysis, data services, permission control, and system management, supporting over 30 departments and EB‑scale storage.

Kuaishou : Kuaishou’s data‑service platform stores raw data in a data lake, transforms it into domain‑oriented assets, and serves them via RPC and HTTP APIs; it ensures high availability through elastic container services, resource isolation, and full‑link monitoring of synchronization, stability, and data correctness.

JD : JD’s platform has progressed through scale‑up, systematic, real‑time, intelligent, and commercialization stages, featuring compute‑storage separation, massive HDFS clusters with erasure coding, container‑native elastic scheduling, Easy Realtime SQL‑based development, federated learning engines, and cross‑domain data services.

Overall, the article extracts common architectural patterns—data ingestion pipelines, distributed storage (HDFS, HBase), processing engines (MapReduce, Spark, Flink, Storm), scheduling systems, and service‑oriented APIs—that can guide the construction of robust big‑data platforms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Platform Spark Hadoop

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.