Overview of Taobao Cloud Computing Architecture and Data Synchronization Solutions
This article presents a comprehensive overview of Taobao's cloud computing architecture, detailing system components, various data synchronization methods such as TimeTunnel, Dbsync, and DataX, the scheduling system design, and metadata-driven analysis platforms for performance optimization and monitoring.
1. System Architecture
Data flows from top to bottom, moving from multiple data sources through a Gateway and the Cloud Ladder to various application scenarios.
2. Taobao Cloud Computing Overview
The platform is composed of three main parts: data sources, a data platform, and data clusters.
3. Data Synchronization Solutions
3.1 Overview
3.2 Real‑time vs. Non‑real‑time Synchronization
3.3 TimeTunnel2
TimeTunnel is a real‑time data transmission platform whose main functions are publishing data to the platform and subscribing to data of interest.
Key characteristics include high efficiency (single node can handle up to 40 k TPS), high reliability (M‑S mode guarantees no data loss), high availability (single‑node failure does not affect the cluster), and ordered delivery when no failures occur.
3.4 Dbsync
Dbsync synchronizes service‑library data to HDFS by analyzing database server log files, extracting database actions, and delivering incremental data to Hadoop.
Performance examples:
2 KB record size → 4 MB/s
9 KB record size → 10 MB/s
Typical scenario: 800 GB data, non‑real‑time sync completes in 55 minutes, real‑time sync in 25 minutes.
3.5 DataX
DataX is a tool for exchanging data between heterogeneous data stores (RDBMS, NoSQL, file systems). It uses a framework plus plugins; the framework handles high‑speed data exchange, while plugins provide access to specific systems.
Supported execution modes: stand‑alone or on Hadoop, with both Web UI and CLI interfaces. Configuration is highly efficient; for example, a sharded table with 32 databases and 1 024 tables can be configured in under one minute.
4. Scheduling System
4.1 Production‑rate Silver Bullet
4.2 Modules / Sub‑systems
4.3 Task Trigger Methods
Flow control / Data Trigger and Time Trigger are illustrated below.
4.4 Scheduling Modes
4.5 Gateway Definition
A Gateway is a resource participating in the scheduling system, providing functions such as data synchronization (DataX, Dbsync, TimeTunnel2), data upload/download (hadoop fs –put/get/getmerge), log collection, Hive SQL execution, MapReduce job submission, and inter‑cluster data sync (hadoop distcp).
4.6 Gateway Scale and Planning
Approximately 30 Gateways are used in production, managed centrally for task distribution and parallel control.
4.7 Gateway Standardization
4.8 Dynamic Load Balancing Implementation
4.9 Priority Strategy Implementation
4.10 Priority Strategy Significance
4.11 Monitoring Panorama
5. Metadata Applications
Key questions include whether to rely on experienced architects or intelligent analysis systems.
5.1 Mining Metadata Goldmine
5.2 Metadata‑Based Development Platform
Features include automatic code generation, input location, code optimization, automated deployment and scheduling, pairwise analysis, hotspot detection, field‑change impact analysis, and transformation tracing.
5.3 Metadata‑Based Analysis Platform – Runtime Analysis System
5.4 Analysis Strategy Overview
5.5 Runtime Data Collection
5.6 Macro Analysis Strategy
5.7 Bottleneck Localization
Each stage’s throughput varies dynamically; the overall system throughput is limited by the stage with the smallest capacity. Visualizing throughput curves for each stage helps identify and address bottlenecks, and buffering queues can be added for stages with high variance.
Methods include plotting per‑stage throughput curves, buffer queue status between stages, and normalizing metrics to task level.
5.8 Most Worthwhile Optimization Targets
From a critical‑path perspective, tasks that have long runtimes, appear on multiple critical paths, and have high variability are prioritized for optimization.
Source: Compiled from internet resources titled “Taobao Cloud Ladder Distributed Computing Platform Overall Architecture”. Original link: https://www.afenxi.com/64409.html
© Content sourced from the internet; copyright belongs to the original authors. We strive to credit sources and will remove any infringing material upon request.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
