How Alibaba’s TDDL Evolved from Cobar to Power Billions of Daily Queries
This article traces the evolution of Alibaba’s distributed data layer—from the early Cobar system to the modern TDDL framework and DRDS service—explaining their architectures, limitations, sharding principles, transaction‑boundary strategies, heterogeneous index tables, and the Jingwei data‑replication platform that together enable seamless scaling and high‑performance SQL processing across thousands of databases.
Originally, business data at Alibaba was stored in a single‑database‑single‑table model. As data volume grew, it was split across multiple databases and tables, increasing the complexity of data‑access‑layer development and raising the risk of platform impact.
In 2006, Alibaba’s B2B team open‑sourced Cobar , a distributed relational data processing system that alleviated Oracle’s scalability issues and handled about 5 billion SQL operations daily. However, Cobar could not support cross‑database joins, pagination, sorting, sub‑queries, ignored SET statements (except for transaction and charset settings), required split‑field columns in INSERT statements, prohibited updating split fields in UPDATE statements, and lacked support for SAVEPOINT, certain JDBC parameters, and BLOB/BINARY/VARBINARY handling.
In 2008, to meet Taobao’s growing needs, Alibaba rebuilt the distributed data layer as TDDL (Taobao Distributed Data Layer). TDDL improved support for sharding scenarios, offered a more developer‑friendly experience, and dramatically enhanced control capabilities.
Today, TDDL is the default distributed data‑layer middleware for Alibaba Group, serving thousands of applications and processing over a trillion SQL calls per day. Its architecture inherits Cobar’s positioning between applications and backend databases, adding precise SQL parsing for routing, cross‑database join and aggregation support, and full SQL‑compatible syntax.
TDDL’s three‑layer data source design follows JDBC standards, ensuring no code intrusion for front‑end applications. The Matrix layer (TDataSource) implements sharding logic and holds multiple GroupDs instances. The Group layer (TGroupDataSource) provides master‑slave and read‑write separation, containing several Atom instances. The Atom layer (TAtomDataSource) manages low‑level connection details (IP, port, password, etc.).
A single SQL request from an application is parsed by TDDL, which routes it precisely to the target database(s), aggregates results if needed, and returns them to the application, as illustrated in the processing flow diagrams.
Key design principles include:
Database master‑slave and dynamic switching.
Weighted read‑write separation.
Single‑threaded read retry.
Centralized data‑source management and dynamic changes.
Support for MySQL and Oracle.
JDBC‑compliant extensibility.
No Server or client‑jar; applications connect directly.
Control over read/write concurrency and flow.
Analyzable log printing and flow control.
As Alibaba’s business diversified beyond e‑commerce, a new generation distributed database product, DRDS (Distributed Relational Database Service), was launched in 2014. DRDS builds on TDDL’s capabilities, offering enhanced business‑scenario support, fault isolation, and operational control, and is now a standard cloud product on Alibaba Cloud serving external customers.
When sharding data, the principle of “as even as possible” is crucial. Simple hash‑modulo on an auto‑increment ID works well for many cases but may cause hotspots for high‑volume sellers. Therefore, the choice of sharding key (order ID vs. buyer ID) must consider data distribution and query patterns.
Reducing transaction boundaries is another principle: SQL statements that include the sharding key can be routed to a single database, avoiding large‑scale cross‑database scans. When queries lack a sharding key (e.g., a buyer viewing recent orders), the distributed layer must broadcast the query to all shards, aggregate results, and may incur higher lock contention and reduced scalability.
To mitigate frequent full‑table scans, heterogeneous index tables are employed. An index table stores a copy of the primary data keyed by an alternative dimension (e.g., buyer ID). This “space‑for‑time” trade‑off enables efficient single‑shard queries while avoiding data duplication of the entire row set.
The Jingwei platform implements heterogeneous indexing at the database layer. It is a MySQL‑based real‑time data replication framework consisting of an Extractor (captures binlog events), a Pipeline (filters and transforms events), and an Applier (writes to target databases). Jingwei supports multi‑threaded pipelines for high throughput, while ensuring ordering for the same record by hashing the database‑table‑primary‑key to a thread.
Jingwei provides a user‑friendly web UI for configuring data sources, selecting event types (insert, update, delete), and defining sharding keys. Advanced features include field filtering, mapping, and custom transformation code. Monitoring is handled via Zookeeper heartbeats, latency accumulation, and task status metrics (TPS, errors).
For high‑frequency search scenarios (e.g., product search on Taobao), database‑level full‑table scans are impractical. Alibaba therefore uses a dedicated search platform (based on technologies like Lucene, Solr, Elasticsearch) that synchronizes data from the database to a search index, providing fast, scalable search capabilities.
In summary, Alibaba’s distributed data layer journey—from Cobar to TDDL, DRDS, and the Jingwei replication platform—demonstrates how careful sharding design, transaction‑boundary reduction, heterogeneous indexing, and specialized search engines together achieve high performance, scalability, and operational stability for massive e‑commerce workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
