Why and How to Perform Database Sharding and Table Partitioning
Database sharding and table partitioning are essential scaling techniques that separate data across multiple databases and tables to handle growing user volume, data size, and concurrency, with explanations of when to split, methods (horizontal vs vertical), and a comparison of popular middleware solutions.
Why Split Database and Table?
In short, sharding (splitting databases) and table partitioning are two different concepts and should not be confused; you may split only databases, only tables, or both.
Imagine a small startup that initially has 200,000 registered users, 10,000 daily active users, and each table receives about 1,000 new rows per day. The system can be handled by a few engineers and a single MySQL instance.
After rapid growth, the user base expands to 20 million registered users and 1 million daily active users, with each table receiving 100,000 new rows per day and peak concurrency reaching 1,000 requests per second. The single‑node MySQL starts to feel the pressure.
When the user count further explodes to 100 million and daily active users reach 10 million, a single table can contain 20–30 million rows, and peak concurrency climbs to 5,000–8,000 QPS, the database can no longer sustain the load.
Table Partitioning
When a single table reaches tens of millions of rows, query performance degrades sharply. As a rule of thumb, once a table exceeds a few hundred thousand rows, you should consider partitioning it.
Table partitioning means moving rows of a large table into multiple smaller tables, usually based on a key such as user ID, so that each partition contains at most a few million rows (e.g., 2 million). This keeps each table’s size manageable and improves SQL execution speed.
Database Sharding
Database sharding means distributing data across multiple physical databases. A single healthy MySQL instance can comfortably handle around 1,000–2,000 QPS; beyond that you need to add more instances. By splitting data into several databases, you increase both concurrency capacity and storage.
Together, sharding and partitioning form the classic "split database and table" strategy.
Before Sharding/Partitioning
After Sharding/Partitioning
Concurrency Support
Single‑machine MySQL cannot handle high concurrency
Multiple MySQL instances increase concurrency manyfold
Disk Usage
Single‑machine disk space nearly full
Data spread across several machines, disk usage drops
SQL Performance
Large tables make SQL slower
Smaller tables improve SQL execution efficiency
Common Sharding Middleware
The following middleware solutions are widely used for database sharding and table partitioning:
Cobar
TDDL
Atlas
Sharding‑jdbc (now called ShardingSphere )
Mycat
Cobar
Developed by Alibaba's B2B team, Cobar is a proxy‑layer solution that sits between the application and MySQL. It parses SQL, applies sharding rules, and forwards the query to the appropriate MySQL instance. It is now rarely updated and lacks features such as read/write splitting, stored procedures, cross‑database joins, and pagination.
TDDL
Created by the Taobao team, TDDL is a client‑side solution that supports basic CRUD and read/write splitting but does not support joins or multi‑table queries. It also depends on Alibaba's Diamond configuration service.
Atlas
Open‑sourced by 360, Atlas is another proxy‑layer product. Its community has not been active for several years, so adoption is low.
Sharding‑jdbc / ShardingSphere
Originally open‑sourced by Dangdang, Sharding‑jdbc is a client‑side library that has evolved into ShardingSphere . It supports sharding, read/write splitting, distributed ID generation, and flexible transactions. As of version 4.0.0‑RC1 (released April 2019), it remains actively maintained and is a popular choice.
Mycat
Mycat is a proxy‑layer middleware derived from Cobar. It offers a rich feature set and has an active community, making it a hot choice for many companies, although it is newer than Sharding‑jdbc and has undergone less real‑world testing.
Summary of Middleware Choices
For most scenarios, Sharding‑jdbc (client‑side) and Mycat (proxy‑side) are the two recommended options.
Sharding‑jdbc’s advantages include zero deployment overhead, low operational cost, and high performance because it avoids an extra proxy layer. However, any version upgrade requires updating every client that uses it, creating tight coupling.
Mycat requires a separate deployment and higher operational effort, but it is transparent to applications and centralizes sharding logic, which is beneficial for large organizations with many projects.
In practice, small‑to‑medium companies often prefer Sharding‑jdbc for its simplicity, while large enterprises may choose Mycat to handle complex, high‑traffic environments.
How to Split a Database
Horizontal Splitting copies the same schema across multiple databases and distributes rows based on a sharding key, thereby increasing both storage capacity and concurrency.
Vertical Splitting separates a wide table into multiple tables (or databases) with different column sets, typically moving frequently accessed columns to one table and less‑used columns to another, which improves cache efficiency.
Additionally, table‑level splitting (i.e., creating N tables from one large table) keeps each table’s row count within a manageable range (commonly around 2 million rows) to maintain query performance.
Both sharding and partitioning can be driven by middleware that automatically routes queries based on a chosen key (e.g., user_id ).
Two common sharding strategies are:
Range‑based sharding: each database stores a continuous range of values (often time‑based). It simplifies scaling but can create hotspot problems because recent data receives most traffic.
Hash‑based sharding: a hash function distributes rows evenly across databases, balancing load but making scaling more complex due to data migration.
Range sharding is easy to extend by adding a new database for each new time period, whereas hash sharding offers better load distribution at the cost of more involved re‑sharding when capacity grows.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.