Big Data 7 min read

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

This article examines the big‑data architectures of three leading Chinese internet companies—Taobao, Meituan, and Didi—detailing their data sources, synchronization mechanisms, batch and streaming processing layers, and the common scheduling components that unify their Hadoop‑based ecosystems.

Architecture Digest
Architecture Digest
Architecture Digest
Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

In order to learn from the design patterns of large‑scale data platforms, we review the architectures of Taobao, Meituan, and Didi, three of China’s most prominent internet companies. Although each platform reflects its own business scenarios and technology stacks, the overall structure follows a similar three‑tier model: data ingestion, a Hadoop‑based processing cluster, and downstream applications.

Taobao Big Data Platform

Taobao was one of the earliest companies to build an internal big‑data platform. Its architecture consists of three parts: data sources (Oracle, MySQL replicas, log systems, and crawlers) synchronized to Hadoop via a gateway that includes DataExchange (full‑batch sync), DBSync (real‑time incremental sync), and TimeTunnel (real‑time log and crawler sync). All data is stored in HDFS. Jobs are scheduled by the proprietary "Tianwang" scheduler, which orders tasks based on resource availability and priority, and writes results back to HDFS and then to MySQL/Oracle via DataExchange. Downstream services such as recommendation engines read the results directly from the databases.

Meituan Big Data Platform

Meituan’s data originates from MySQL (captured via Canal binlog) and logs (collected by Flume), both fed into a Kafka message queue. Kafka streams are consumed by two processing engines: real‑time stream processing with Storm, whose results are written to HBase or relational databases, and batch processing with Hive, whose results feed BI dashboards. Users and executives access the processed data through BI tools and a custom "Tianji" reporting system. The entire workflow is orchestrated by an internal scheduling platform, and developers use an ETL development portal to build and submit jobs.

Didi Big Data Platform

Didi separates its platform into a real‑time streaming layer and an offline batch layer. In the streaming layer, data is ingested into Kafka, then processed either by Spark Streaming or Flink (ETL) and persisted to HDFS for later batch jobs, or by Druid for real‑time metrics that feed alerting systems and dashboards. The offline layer is built on Hadoop 2 (HDFS, YARN, MapReduce) together with Spark and Hive, complemented by a custom scheduler and a visual SQL editor for job submission.

Didi also heavily utilizes HBase and Phoenix; the results from both streaming and batch computations are stored in HBase, and applications access the data via Phoenix, a SQL layer on top of HBase.

Conclusion

Despite minor differences in product selection and detailed configurations, the three platforms share a common architectural philosophy: a unified scheduling system coordinates data ingestion, processing (both real‑time and batch), and result delivery. Understanding these patterns provides deeper insight into the design of modern big‑data infrastructures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataTaobaoKafkaDidiSparkHadoopData ArchitectureMeituan
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.