Understanding Database Sharding: When and How to Split Databases and Tables
The article explains why and when to apply database sharding and table partitioning, illustrates real‑world growth scenarios, compares horizontal and vertical splitting methods, reviews common sharding middleware such as Cobar, TDDL, Atlas, Sharding‑jdbc (ShardingSphere) and Mycat, and offers practical recommendations for different company sizes.
In this technical guide, a senior architect explains the need for database sharding and table partitioning, emphasizing that they are distinct concepts and should not be confused.
He presents a growth scenario: a startup starts with 200 k registered users and modest traffic, but as the business scales to millions of users and thousands of QPS, a single database can no longer handle the load, leading to storage exhaustion and performance degradation.
Why Split Databases and Tables?
When a single table reaches tens of millions of rows, query performance degrades sharply; when a single database cannot sustain the required concurrency (typically >2000 QPS), horizontal scaling becomes necessary.
Table Partitioning (分表)
Table partitioning moves data from one large table into multiple smaller tables, often based on a user ID, keeping each table’s row count within a manageable range (e.g., 2 million rows).
Database Splitting (分库)
Database splitting distributes data across multiple databases, each handling a portion of the overall traffic; a healthy single database should stay around 1000 QPS.
Common Sharding Middleware
Cobar : Alibaba’s proxy‑layer solution, now largely abandoned and lacking features such as read/write separation and cross‑database joins.
TDDL : Taobao’s client‑layer solution supporting basic CRUD and read/write separation but not joins; depends on Alibaba’s Diamond configuration system.
Atlas : 360’s proxy‑layer project, unmaintained for several years.
Sharding‑jdbc (now ShardingSphere ): Dangdang’s client‑layer solution, actively maintained, supporting sharding, read/write separation, distributed ID generation, and flexible transactions (up to version 4.0.0‑RC1 ).
Mycat : A proxy‑layer project derived from Cobar, widely used, offering comprehensive features but requiring deployment and operational effort.
Choosing a Solution
For small‑to‑medium companies, Sharding‑jdbc is recommended due to its low operational cost and ease of integration. Large enterprises with many projects may prefer Mycat for its transparent proxy approach.
How to Perform Database Splitting
Horizontal splitting distributes rows of a single table across multiple databases with identical schemas, increasing concurrency capacity and storage.
Vertical splitting separates a wide table into multiple tables (or databases) with different schemas, placing frequently accessed columns together to improve cache efficiency.
Two common sharding strategies are:
Range‑based: each database holds a continuous range (e.g., by time); simple to expand but can create hotspot issues.
Hash‑based: data is distributed by hashing a key (e.g., user ID); balances load but makes scaling more complex due to data migration.
Middleware can automatically route queries to the correct database and table based on the sharding key.
Overall, the guide stresses that sharding decisions should follow business growth, data volume, and concurrency requirements, and that both sharding middleware and proper splitting strategies are essential for maintaining performance at scale.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.