Databases 14 min read

When Should You Split Your Database Tables? Practical Guidelines and Real‑World Cases

This article examines the signs that a database table has reached its limits, explains why vertical and horizontal sharding are needed, offers concrete sizing formulas, compares hash, range and consistent‑hash partitioning, and shares large‑scale case studies from Suning, JD, Meituan, Ant Financial and Taobao.

dbaplus Community

Nov 18, 2021

When Should You Split Your Database Tables? Practical Guidelines and Real‑World Cases

1. When Do Tables Reach a Bottleneck?

Suning PINGGOU, an e‑commerce app under Suning.com, surpassed 30 million users by July 2018, handling tens of millions of daily active users, new SKUs and orders. Its single database grew by over 100 million rows per day, peaked at 100 k QPS, and required monthly data migrations, causing high operational cost and risk of cache‑miss disasters.

JD’s distribution platform originally used SQL Server, then migrated to an Oracle/IBM AIX RAC + DataGuard setup to support millions of orders, but the monolithic architecture became costly and hard to scale.

Meituan Dianping’s order database exceeded 200 GB sixteen years ago; despite adding indexes and read replicas, traffic spikes and rapid feature growth made the single‑table design untenable, prompting a decision to split the tables.

2. Purpose and Solutions for Splitting

2.1 Business Data Decoupling – Vertical Splitting

Separate distinct business domains into independent databases, aligning with micro‑service decomposition. This isolates services, enables independent scaling, and prevents a single database from becoming a performance bottleneck when services are horizontally scaled.

2.2 Capacity and Performance – Horizontal Splitting

When a single table or database reaches size or QPS limits, split the data across multiple tables (sharding) or databases. Typically start with table‑level sharding; once the single database becomes a bottleneck, consider database‑level sharding.

2.3 How Many Shards Are Appropriate

Empirical rule: a table’s performance degrades after ~10 million rows; aim for ~6 million rows per table. For connections, keep per‑database concurrent connections around 4 000. Example calculation for a platform generating 100 k orders daily, 8 k QPS peak, and projected growth to 250 k orders/day in three years:

Table: (3 years * 365 days * 350 k) / 6 M ≈ 63 → about 64 tables</code>
<code>Database: 10 k concurrent / 4 k ≈ 2.5 → use 4 databases

2.4 Choosing a Splitting Method

Hash Modulo – Even distribution, simple routing; drawback is costly data migration if business volume exceeds estimates.

Range Partitioning – Data stored sequentially; easy to grow without migration, but can cause hotspoting on a single shard.

Consistent Hashing – Virtual nodes provide balanced distribution and minimal data movement during scaling; routing logic is more complex.

3. New Issues Introduced by Splitting

3.1 Partition Key Selection

Choose a uniformly distributed identifier (e.g., user ID, order ID) as the shard key; time‑based tables may use date as the key.

3.2 Global Unique Primary Key

Generate distributed IDs (e.g., Snowflake‑style) to ensure uniqueness across shards.

3.3 Data Migration Strategies

Two approaches: stop‑the‑world migration (simple but service‑disruptive) or smooth migration (dual‑write, checkpoint, gradual cut‑over), similar to Kafka cluster expansion.

3.4 Distributed Transaction Challenges

When data spans multiple databases, adopt patterns such as TCC, saga, or local‑transaction‑plus‑message to maintain consistency.

3.5 Query Limitations

Cross‑shard joins and aggregates must be re‑implemented at the application layer or via middleware, as a single database can no longer answer them directly.

4. Large‑Company Case Studies

4.1 Ant Financial Routing Rules

Uses user‑ID suffixes for shard routing, often employing a “hundred‑database‑hundred‑table” pattern; IDs are generated to guarantee up to 10 trillion unique values.

4.2 Meituan Dianping Data Migration

Three‑phase migration: (1) dual‑write with reconciliation, (2) import historical data and switch reads, (3) stop dual‑write and delete old data.

4.3 Taobao Trillion‑Level Order Storage Engine

Adopts similar historical‑table migration and sharding techniques to handle massive order volumes.

These examples illustrate practical sizing, partitioning strategies, and operational considerations for large‑scale database sharding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization large-scale systems distributed transactions Horizontal Partitioning vertical splitting

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. When Do Tables Reach a Bottleneck?

2. Purpose and Solutions for Splitting

2.1 Business Data Decoupling – Vertical Splitting

2.2 Capacity and Performance – Horizontal Splitting

2.3 How Many Shards Are Appropriate

2.4 Choosing a Splitting Method

3. New Issues Introduced by Splitting

3.1 Partition Key Selection

3.2 Global Unique Primary Key

3.3 Data Migration Strategies

3.4 Distributed Transaction Challenges

3.5 Query Limitations

4. Large‑Company Case Studies

4.1 Ant Financial Routing Rules

4.2 Meituan Dianping Data Migration

4.3 Taobao Trillion‑Level Order Storage Engine

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

4.2 Meituan Dianping Data Migration