Databases 9 min read

How TiDB Solved Our Billion‑Row Sharding Challenge: A Real‑World NewSQL Case Study

Facing billions of rows spread across 1024 tables in 64 databases, a Chinese gaming platform replaced its custom sharding framework with TiDB, achieving real‑time OLTP consistency, elastic scaling, robust backup, high availability, and lower migration costs while preserving MySQL compatibility.

dbaplus Community
dbaplus Community
dbaplus Community
How TiDB Solved Our Billion‑Row Sharding Challenge: A Real‑World NewSQL Case Study

Author: Tao Zheng, MySQL DBA lead at Youzu Network Platform Department, former DBA for Tongcheng Travel system architecture.

Background

In early 2017 the company’s user‑center system stored over a hundred million core records using a hash‑key based sharding scheme: 1024 tables distributed across 64 MySQL databases, accessed via a custom code framework. Non‑hash queries, DDL changes, and the rigidity of the sharding logic caused high operational costs and limited flexibility.

Architecture diagram of the original sharding solution
Architecture diagram of the original sharding solution

The team needed a distributed database cluster that could provide real‑time OLTP consistency, elastic architecture, integrated backup/monitoring, high availability, and low migration cost.

Early Options

Two initial solutions were evaluated and rejected:

Open‑source sharding middleware: suffered from weak 2PC‑based XA consistency, complex high‑availability design where a single shard failure could affect the whole system, cumbersome backup/restore, and introduced new sharding and join‑key design challenges, increasing migration difficulty.

MySQL Cluster (NDB): required changing table engines, had unknown performance characteristics, risked split‑brain in HA, lacked mature monitoring/backup tooling, and had limited production experience in China, especially for the non‑enterprise edition.

Exploration of NewSQL

Inspired by Google Spanner, the team discovered TiDB, an open‑source NewSQL database that offers:

MySQL‑compatible SQL support

Linear horizontal scalability

Distributed transactions

Strong consistency across data centers

Self‑recovering high availability via Raft

HTAP capability for massive concurrent writes and real‑time queries

These features aligned well with the platform’s pain points, prompting a pilot deployment.

Implementation Details

Backup & migration: used the official mydumper / myloader logical backup suite for easy data transfer.

Monitoring: integrated Prometheus‑based metrics collection with existing monitoring systems.

High availability: the KV layer employed Raft for leader election and automatic failover.

Stateless TiDB‑Server layer: fronted by LVS or HAProxy to enable seamless failover.

Region splitting in the KV layer provided seamless, perception‑free scaling.

After testing, the team observed:

TiDB’s distributed nature adds network and replica overhead, so single‑node Sysbench OLTP throughput is lower than MySQL, but performance remains stable at large data volumes and scales to meet peak load.

TiDB’s OLAP performance outperforms MySQL on big datasets, showing continuous improvement.

The stateless TiDB‑Server allows adding load balancers to further scale the query layer.

Ongoing Adoption

With performance and requirements satisfied, business‑level migration proceeded. Because TiDB speaks the MySQL protocol, migration cost dropped dramatically; the official Syncer tool enabled near‑zero‑downtime master‑slave switchovers.

Key business systems were refactored:

Login state system: switched from REPLACE to INSERT, preserving every login record; a single table now holds over a billion rows without performance issues.

Gift‑code system: collapsed 100 sharded tables into a single archive table projected to exceed 2 billion rows.

User tracking project: leveraged the elastic KV layer to store high‑frequency behavior data.

When the KV layer was not a bottleneck, the team reused it for multiple services, effectively offering a DBaaS‑like platform.

Multi‑service TiDB deployment diagram
Multi‑service TiDB deployment diagram

Current Situation and Outlook

From version RC2.2 to GA1.0, three TiDB clusters now run in production, supporting six OLTP services continuously for nearly a year. The ecosystem provides robust deployment strategies, bug‑fix assistance, upgrade paths, and migration tooling from both the vendor and the open‑source community. For future analytical workloads, the team plans to adopt TiSpark on top of TiDB.

Overall, NewSQL gave the platform a fresh direction for handling large‑scale tables and databases, and the team expects it to influence many future design choices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsTiDBdatabase shardingNewSQLOLTPMySQL Migration
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.