How TDSQL Achieves Scalable, High‑Availability Distributed SQL on MySQL
This article explains how TDSQL transforms MySQL into a distributed, high‑availability SQL system by addressing NoSQL limitations, introducing a Scheduler‑Agent‑Gateway architecture, automatic scaling, sharding, robust disaster‑recovery mechanisms, and future integration with container technologies.
As business grew, the in‑memory NoSQL HOLD platform proved valuable but showed limitations such as limited use cases, inability to index non‑hot data, and insufficient capacity for large datasets, prompting a shift to a distributed SQL solution built on MySQL.
By retaining the MySQL protocol, existing C++/Java clients and DBA habits remain unchanged while adding features like automatic cross‑IDC disaster recovery, transparent capacity scaling, and strong OLTP support.
Overall Architecture
The TDSQL system consists of three modules—Scheduler, Agent, and Gateway—communicating via ZooKeeper, simplifying node interaction compared to the second‑generation HOLD.
Scheduler
Manages sets (creation, deletion, node replacement).
Dispatches all DDL operations.
Monitors node health and triggers high‑consistency master‑slave failover.
Tracks CPU, disk, and table resource usage, initiating auto‑scaling when needed.
Ensures scheduler itself has no single point of failure through ZooKeeper election.
Agent
Periodically checks local MySQL instance health via short connections and reports anomalies to ZooKeeper.
Monitors master‑slave replication status, reports lag, and automatically rebuilds replication using xtrabackup.
Reports CPU usage, request volume, and data size to ZooKeeper for global scaling decisions.
Executes assigned scaling tasks.
Handles planned failover procedures.
Gateway
Parses SQL, stores identified DDL statements in ZooKeeper for unified scheduling.
Watches ZooKeeper for routing updates and caches the latest routing table.
Routes SQL to the appropriate set, supporting read/write separation.
Performs IP, username, and password authentication.
Records full SQL execution details for real‑time monitoring.
Aggregates results for queries involving count, distinct, sum, avg, max, min, order/group by across multiple sets (no cross‑set join or distributed transaction support).
Stateless deployment, either co‑located with business services or standalone behind TGW/LVS for disaster recovery.
Automatic Scaling Mechanism
Two scaling strategies are considered: vertical scaling (hardware upgrades) and horizontal scaling (sharding or distributing tables across instances). Horizontal scaling is preferred for internet‑scale workloads, and TDSQL implements a transparent, automated horizontal scaling process.
Sharding Logic
Each logical table can be split into up to 10,000 physical shards, each stored as a separate MySQL table. Shard fields are embedded in CREATE statements via comments, enabling the gateway to route queries directly to the appropriate set.
Scaling Process
When Scheduler detects resource thresholds are exceeded, it initiates a two‑step expansion: (1) create target shard tables on a new set, (2) use the agent to export data with mysqldump filtered by shard range and import it into the new set while tracking binlog positions. After data migration, the original shard is renamed, a final binlog catch‑up is performed, and the routing table is updated, resulting in a ~200 ms downtime for the affected shard.
Disaster Recovery Mechanisms
TDSQL aims for automatic failover, recovery, strong consistency, and cross‑IDC protection. It evaluates MySQL asynchronous replication (fast but unsafe for finance), semi‑synchronous replication (safer but performance‑limited across IDC), and cluster solutions such as Percona XtraDB and MariaDB Galera (strong sync but high latency in cross‑IDC environments). Performance tests show significant overhead for semi‑sync and Galera, making them unsuitable for low‑latency billing scenarios.
Strong Synchronous Solution Adopted by TDSQL
Building on the thread‑pool model, TDSQL splits the SQL execution path: the first half writes to binlog and saves the session, allowing the thread to process other requests; a dump thread immediately streams the binlog to replicas; a dedicated response thread processes acknowledgments and completes the commit, eliminating blocking waits and boosting performance.
High‑Availability Guarantees
Each set should have at least three cross‑IDC nodes, optionally exposing read‑only queries on replicas.
Nodes can be added flexibly; TDSQL uses Xtrabackup for full and incremental replication, copying ~500 GB/hour.
Row‑based binlog flashback handles transactions lost due to abrupt master termination, with manual confirmation for non‑reversible operations.
ZooKeeper monitors node health and orchestrates strict automated failover procedures.
Future Directions
Integrate TDSQL with Docker for better resource isolation as cluster size grows.
Continue exploring Galera cluster adoption.
Investigate thread‑pool plus coroutine models to further improve performance.
Address cross‑database joins and limited distributed transactions by offloading OLAP workloads to TDW.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
