When Should You Split Your Database Tables? Practical Guidelines and Real‑World Cases
This article examines the signs that a database table has reached its limits, explains why vertical and horizontal sharding are needed, offers concrete sizing formulas, compares hash, range and consistent‑hash partitioning, and shares large‑scale case studies from Suning, JD, Meituan, Ant Financial and Taobao.
1. When Do Tables Reach a Bottleneck?
Suning PINGGOU, an e‑commerce app under Suning.com, surpassed 30 million users by July 2018, handling tens of millions of daily active users, new SKUs and orders. Its single database grew by over 100 million rows per day, peaked at 100 k QPS, and required monthly data migrations, causing high operational cost and risk of cache‑miss disasters.
JD’s distribution platform originally used SQL Server, then migrated to an Oracle/IBM AIX RAC + DataGuard setup to support millions of orders, but the monolithic architecture became costly and hard to scale.
Meituan Dianping’s order database exceeded 200 GB sixteen years ago; despite adding indexes and read replicas, traffic spikes and rapid feature growth made the single‑table design untenable, prompting a decision to split the tables.
2. Purpose and Solutions for Splitting
2.1 Business Data Decoupling – Vertical Splitting
Separate distinct business domains into independent databases, aligning with micro‑service decomposition. This isolates services, enables independent scaling, and prevents a single database from becoming a performance bottleneck when services are horizontally scaled.
2.2 Capacity and Performance – Horizontal Splitting
When a single table or database reaches size or QPS limits, split the data across multiple tables (sharding) or databases. Typically start with table‑level sharding; once the single database becomes a bottleneck, consider database‑level sharding.
2.3 How Many Shards Are Appropriate
Empirical rule: a table’s performance degrades after ~10 million rows; aim for ~6 million rows per table. For connections, keep per‑database concurrent connections around 4 000. Example calculation for a platform generating 100 k orders daily, 8 k QPS peak, and projected growth to 250 k orders/day in three years:
Table: (3 years * 365 days * 350 k) / 6 M ≈ 63 → about 64 tables</code>
<code>Database: 10 k concurrent / 4 k ≈ 2.5 → use 4 databases2.4 Choosing a Splitting Method
Hash Modulo – Even distribution, simple routing; drawback is costly data migration if business volume exceeds estimates.
Range Partitioning – Data stored sequentially; easy to grow without migration, but can cause hotspoting on a single shard.
Consistent Hashing – Virtual nodes provide balanced distribution and minimal data movement during scaling; routing logic is more complex.
3. New Issues Introduced by Splitting
3.1 Partition Key Selection
Choose a uniformly distributed identifier (e.g., user ID, order ID) as the shard key; time‑based tables may use date as the key.
3.2 Global Unique Primary Key
Generate distributed IDs (e.g., Snowflake‑style) to ensure uniqueness across shards.
3.3 Data Migration Strategies
Two approaches: stop‑the‑world migration (simple but service‑disruptive) or smooth migration (dual‑write, checkpoint, gradual cut‑over), similar to Kafka cluster expansion.
3.4 Distributed Transaction Challenges
When data spans multiple databases, adopt patterns such as TCC, saga, or local‑transaction‑plus‑message to maintain consistency.
3.5 Query Limitations
Cross‑shard joins and aggregates must be re‑implemented at the application layer or via middleware, as a single database can no longer answer them directly.
4. Large‑Company Case Studies
4.1 Ant Financial Routing Rules
Uses user‑ID suffixes for shard routing, often employing a “hundred‑database‑hundred‑table” pattern; IDs are generated to guarantee up to 10 trillion unique values.
4.2 Meituan Dianping Data Migration
Three‑phase migration: (1) dual‑write with reconciliation, (2) import historical data and switch reads, (3) stop dual‑write and delete old data.
4.3 Taobao Trillion‑Level Order Storage Engine
Adopts similar historical‑table migration and sharding techniques to handle massive order volumes.
These examples illustrate practical sizing, partitioning strategies, and operational considerations for large‑scale database sharding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
