Big Data 9 min read

Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi

This article examines the architecture of enterprise-level big data platforms at leading Chinese tech firms—Taobao, Meituan, and Didi—detailing their data sources, synchronization components, batch and streaming processing layers, scheduling systems, and common design patterns, while highlighting shared principles across these implementations.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi

The article, excerpted from the book Big Data Technology Architecture: Core Principles and Practice by Li Zhihui, presents a comparative study of enterprise big‑data platforms built by major Chinese internet companies, illustrating how their designs follow similar architectural patterns.

Typical big‑data platforms adopt a Lambda architecture, combining batch and real‑time processing. By reviewing the platforms of Taobao, Meituan, and Didi, readers can learn both the high‑level structure and the concrete engineering solutions used by these leading firms.

Taobao ’s platform consists of three layers: data sources (Oracle, MySQL replicas, log and crawler systems) feeding a data‑exchange gateway that writes to HDFS via components such as DataExchange, DBSync, and TimeTunnel. The Tianwang scheduler orchestrates data ingestion, Hadoop job execution, and result export back to relational databases, supporting downstream applications like recommendation engines.

Many of Taobao’s synchronization components (DBSync, TimeTunnel, DataExchange) are internally developed and have been open‑sourced, providing reusable tools for other projects.

Meituan ingests data from MySQL (via Canal) and logs (via Flume) into Kafka. Streaming jobs run on Storm, while batch analytics use Hive. Results are stored in HBase or relational databases and accessed through BI dashboards. An internal scheduling platform manages ETL development, job submission, and data governance.

Didi separates its platform into a real‑time layer (Spark Streaming or Flink feeding Kafka and Druid for monitoring) and an offline layer built on Hadoop 2 (HDFS, YARN, MapReduce) with Spark and Hive. A custom scheduler controls job priority and execution order, and a visual SQL editor simplifies query development. Didi also runs a dedicated HBase/Phoenix service for low‑latency access to processed data.

The three platforms share common technologies—Hadoop, YARN, Spark, Flink, Storm—and exhibit similar design choices, illustrating that large‑scale data systems often converge on a set of proven patterns. Recognizing these patterns helps engineers understand and replicate successful architectures.

For aspiring architects, the article advises improving programming skills, studying architecture documentation, and participating in technical conferences to deepen understanding of design patterns and real‑world implementations.

Author bio: Li Zhihui, Chief Architect of Tongcheng Travel, former architect at Alibaba and Intel, contributor to Apache Spark, and author of the bestseller Large‑Scale Website Architecture: Core Principles and Case Studies .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureBatch ProcessingStreamingEnterprise
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.