Architecture and Design of TDSQL: A Distributed MySQL‑Based SQL System for High‑Consistency Billing
The article analyzes the architecture of TDSQL, a MySQL‑based distributed SQL system designed for Tencent’s billing platform, detailing its transition from an in‑memory NoSQL solution to a high‑consistency, auto‑scaling, fault‑tolerant database with sharding, scheduler, agent, and gateway components.
TDSQL is a distributed SQL system built on the MySQL storage engine to replace the in‑memory NoSQL HOLD platform used by Tencent's billing department. The new system addresses HOLD's limitations such as limited query capabilities, inefficient memory usage for cold data, and lack of flexibility for diverse business scenarios.
The design goals include preserving the MySQL protocol for existing clients, providing automatic cross‑IDC disaster‑recovery with strong consistency, offering transparent capacity scaling, and focusing on OLTP workloads.
The overall architecture consists of three modules—Scheduler, Agent, and Gateway—coordinated via ZooKeeper. Scheduler manages sets, DDL distribution, node health, resource monitoring, and automatic scaling. Agent monitors local MySQL instances, replication lag, CPU and I/O usage, and reports to ZooKeeper. The Gateway, built on MySQL Proxy, handles SQL parsing, routing, authentication, and aggregation of results.
Scaling is achieved primarily through horizontal expansion using sharding. Logical tables can be split into up to 10,000 physical shards, each stored on separate sets. The system automatically detects resource thresholds and initiates a two‑step expansion process (record binlog position, dump relevant shard data, insert into the target set, replay binlog, and update routing).
For disaster recovery, TDSQL adopts a strong synchronous replication scheme rather than MySQL asynchronous or semi‑synchronous replication. It modifies the thread‑pool model to separate binlog writing from transaction commit, allowing the master to continue processing new requests while a dedicated thread handles acknowledgment from replicas.
Additional high‑availability mechanisms include requiring at least three cross‑IDC nodes per set, seamless node addition using XtraBackup for full and incremental data copy, and automated failover procedures managed by ZooKeeper.
Future work aims to integrate Docker for resource isolation, explore Galera‑style clustering, combine thread pools with coroutines for higher performance, and extend support for limited cross‑shard joins and distributed transactions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
