How TiDB Solved Our Billion‑Row Sharding Challenge: A Real‑World NewSQL Case Study
Facing billions of rows spread across 1024 tables in 64 databases, a Chinese gaming platform replaced its custom sharding framework with TiDB, achieving real‑time OLTP consistency, elastic scaling, robust backup, high availability, and lower migration costs while preserving MySQL compatibility.
Author: Tao Zheng, MySQL DBA lead at Youzu Network Platform Department, former DBA for Tongcheng Travel system architecture.
Background
In early 2017 the company’s user‑center system stored over a hundred million core records using a hash‑key based sharding scheme: 1024 tables distributed across 64 MySQL databases, accessed via a custom code framework. Non‑hash queries, DDL changes, and the rigidity of the sharding logic caused high operational costs and limited flexibility.
The team needed a distributed database cluster that could provide real‑time OLTP consistency, elastic architecture, integrated backup/monitoring, high availability, and low migration cost.
Early Options
Two initial solutions were evaluated and rejected:
Open‑source sharding middleware: suffered from weak 2PC‑based XA consistency, complex high‑availability design where a single shard failure could affect the whole system, cumbersome backup/restore, and introduced new sharding and join‑key design challenges, increasing migration difficulty.
MySQL Cluster (NDB): required changing table engines, had unknown performance characteristics, risked split‑brain in HA, lacked mature monitoring/backup tooling, and had limited production experience in China, especially for the non‑enterprise edition.
Exploration of NewSQL
Inspired by Google Spanner, the team discovered TiDB, an open‑source NewSQL database that offers:
MySQL‑compatible SQL support
Linear horizontal scalability
Distributed transactions
Strong consistency across data centers
Self‑recovering high availability via Raft
HTAP capability for massive concurrent writes and real‑time queries
These features aligned well with the platform’s pain points, prompting a pilot deployment.
Implementation Details
Backup & migration: used the official mydumper / myloader logical backup suite for easy data transfer.
Monitoring: integrated Prometheus‑based metrics collection with existing monitoring systems.
High availability: the KV layer employed Raft for leader election and automatic failover.
Stateless TiDB‑Server layer: fronted by LVS or HAProxy to enable seamless failover.
Region splitting in the KV layer provided seamless, perception‑free scaling.
After testing, the team observed:
TiDB’s distributed nature adds network and replica overhead, so single‑node Sysbench OLTP throughput is lower than MySQL, but performance remains stable at large data volumes and scales to meet peak load.
TiDB’s OLAP performance outperforms MySQL on big datasets, showing continuous improvement.
The stateless TiDB‑Server allows adding load balancers to further scale the query layer.
Ongoing Adoption
With performance and requirements satisfied, business‑level migration proceeded. Because TiDB speaks the MySQL protocol, migration cost dropped dramatically; the official Syncer tool enabled near‑zero‑downtime master‑slave switchovers.
Key business systems were refactored:
Login state system: switched from REPLACE to INSERT, preserving every login record; a single table now holds over a billion rows without performance issues.
Gift‑code system: collapsed 100 sharded tables into a single archive table projected to exceed 2 billion rows.
User tracking project: leveraged the elastic KV layer to store high‑frequency behavior data.
When the KV layer was not a bottleneck, the team reused it for multiple services, effectively offering a DBaaS‑like platform.
Current Situation and Outlook
From version RC2.2 to GA1.0, three TiDB clusters now run in production, supporting six OLTP services continuously for nearly a year. The ecosystem provides robust deployment strategies, bug‑fix assistance, upgrade paths, and migration tooling from both the vendor and the open‑source community. For future analytical workloads, the team plans to adopt TiSpark on top of TiDB.
Overall, NewSQL gave the platform a fresh direction for handling large‑scale tables and databases, and the team expects it to influence many future design choices.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
