Databases 19 min read

Design and Architecture of TDSQL: A Distributed MySQL‑Based SQL System

The article describes how the limitations of an in‑memory NoSQL HOLD platform led to the creation of TDSQL, a distributed MySQL‑based SQL system featuring ZooKeeper‑coordinated Scheduler, Agent, and Gateway modules, automatic cross‑IDC disaster recovery, transparent horizontal scaling, strong synchronous replication, and future integration with container technologies.

Qunar Tech Salon

Nov 5, 2015

Design and Architecture of TDSQL: A Distributed MySQL‑Based SQL System

As business demand grew, the in‑memory NoSQL HOLD platform could handle up to 300 billion read/write operations per day, proving the value of distributed cache, but its limitations—restricted to get/set operations, inefficient memory usage for non‑hot data, and lack of flexible indexing—became apparent.

Leveraging extensive MySQL experience, the team decided to build a distributed SQL system on top of the MySQL storage engine while preserving the MySQL protocol, allowing existing C++ and Java clients and DBAs to continue using familiar tools.

The new system provides automatic cross‑IDC disaster recovery with bank‑level consistency, transparent capacity scaling to overcome MySQL’s rigid expansion, and strong support for OLTP workloads.

Overall Architecture : TDSQL consists of three modules—Scheduler, Agent, and Gateway—whose interactions are coordinated through ZooKeeper, simplifying inter‑node communication compared with the previous HOLD generation.

Scheduler manages sets (clusters), handles creation/deletion/replacement of nodes, dispatches all DDL operations, monitors node health, triggers high‑consistency master‑slave failover, tracks CPU, disk, and table resource usage, initiates automatic scaling, and ensures its own high availability via ZooKeeper leader election.

Agent monitors the local MySQL instance, periodically checks read/write availability, reports replication lag and transaction counts, reports CPU and table usage to ZooKeeper, executes scaling tasks, and initiates failover procedures when required.

Gateway is built on MySQL Proxy, performing SQL parsing, routing, authentication, query aggregation, and detailed execution logging; it supports read/write separation, stateless deployment, and integrates with a second‑level monitoring platform for real‑time metrics.

Automatic Scaling : Two strategies exist—vertical (hardware upgrade) and horizontal (sharding or distributing tables across instances). TDSQL adopts horizontal scaling, automatically handling shard creation, data migration, and routing updates without business involvement.

Sharding Logic : Logical tables can be split into up to 10,000 physical shards, each a MySQL table. The gateway parses incoming SQL, extracts the shard field, and routes the request to the appropriate set; if no shard field is present, the request is broadcast to all relevant shards and results are aggregated.

Expansion Process (Strategy 2) : When a set exceeds resource thresholds, the Scheduler creates the target shard on a new set, the Agent uses mysqldump with a WHERE clause to export the relevant data, streams it directly to the new set, records the binlog position, replays binlog entries until the end, renames the original shard, and finally updates the ZooKeeper routing table. The entire switchover typically takes around 200 ms.

Disaster Recovery : The system aims for automatic failover, data consistency, and cross‑IDC resilience. It evaluates MySQL asynchronous replication, semi‑synchronous replication, and cluster solutions (NDB, Percona XtraDB, MariaDB Galera). Performance tests show significant latency penalties for semi‑sync and Galera in cross‑IDC scenarios, leading to a custom strong‑synchronization design.

Strong Synchronization Scheme : By splitting the SQL execution into two phases—writing the binlog and suspending the session, then letting a dump thread send the binlog to replicas and receive UDP acknowledgments—TDSQL achieves high performance, as demonstrated by benchmark graphs.

High‑Availability Mechanisms : Each set should contain at least three cross‑IDC nodes, with flexible node addition using Xtrabackup for full and incremental data copy (≈500 GB per hour). The system also implements flashback for row‑based binlog recovery and uses ZooKeeper for node monitoring and automated failover.

Future Directions : Plans include tighter integration with Docker for resource isolation, continued exploration of Galera clustering, combining thread pools with coroutines to improve external I/O handling, and extending support for distributed transactions and cross‑database joins, while leveraging TDW for OLAP analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability Sharding High Availability distributed database mysql TDSQL

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.