Why and How to Implement Database Sharding: Strategies, Middleware, and Best Practices
Database sharding—splitting data across multiple databases and tables—is essential as applications scale, and this article explains the motivations, differences between sharding and partitioning, horizontal and vertical splits, common middleware options like Cobar, TDDL, Atlas, ShardingSphere, and Mycat, and how to choose the right solution.
Why Do We Need Database Sharding?
Sharding (splitting databases) and table partitioning are two distinct concepts; they should not be confused. A system may split only databases, only tables, or both.
Consider a startup that begins with 200,000 registered users, 10,000 daily active users, and 1,000 new rows per day. After rapid growth, the user base can reach tens of millions, daily active users in the millions, and peak concurrency of thousands of requests per second, quickly exhausting a single database’s capacity.
When a single table reaches tens of millions of rows and concurrent requests hit several thousand, the database’s disk usage and query performance degrade sharply.
Table Partitioning
When a table grows to tens of millions of rows, performance suffers; splitting the table is necessary. Table partitioning means storing a table’s data across multiple tables, often by user ID, keeping each table’s row count within a manageable range (e.g., under 2 million rows).
Database Partitioning
One database typically handles up to about 2,000 concurrent requests; to stay healthy, keep per‑database concurrency around 1,000. Splitting data across multiple databases distributes load and storage.
Thus, sharding (both database and table splitting) becomes essential as business scales.
Before Sharding
After Sharding
Concurrency Support
MySQL single‑instance cannot handle high concurrency
MySQL scaled to multiple instances, supporting many times higher concurrency
Disk Usage
Single‑instance disk nearly full
Multiple databases reduce disk usage per server
SQL Performance
Large single table makes SQL slow
Reduced table size improves SQL execution speed
Which Sharding Middleware Have You Used?
Common middleware options include:
Cobar
TDDL
Atlas
Sharding‑jdbc (now ShardingSphere)
Mycat
Cobar
Developed by Alibaba’s B2B team, a proxy‑layer solution that parses SQL and routes it to appropriate MySQL instances. It is now rarely used and lacks support for read/write splitting, stored procedures, cross‑database joins, and pagination.
TDDL
Developed by the Taobao team, a client‑layer solution supporting basic CRUD and read/write splitting, but not joins or multi‑table queries, and depends on Alibaba’s Diamond configuration system.
Atlas
360’s open‑source proxy solution, once used by some companies but no longer actively maintained.
Sharding‑jdbc / ShardingSphere
Originally an open‑source client‑layer solution from Dangdang, now renamed ShardingSphere. It supports sharding, read/write splitting, distributed ID generation, and flexible transactions. The latest version (4.0.0‑RC1) is actively maintained and widely adopted.
Mycat
Based on Cobar, a proxy‑layer solution with comprehensive features. It is popular and actively maintained, though younger than ShardingSphere.
Summary
For most cases, Sharding‑jdbc and Mycat are the recommended choices. Sharding‑jdbc’s client‑layer approach offers low operational cost and high performance but introduces coupling when upgrading. Mycat’s proxy‑layer approach requires deployment and higher maintenance effort but provides transparency to applications.
Generally, small‑to‑medium companies benefit from Sharding‑jdbc, while larger enterprises may prefer Mycat for its scalability and team‑level support.
How to Split a Database?
Horizontal Partitioning distributes rows of a table across multiple databases/tables with identical schemas, balancing data and load.
Vertical Partitioning splits a wide table into multiple tables or databases, each containing a subset of columns, placing frequently accessed columns in a separate table to improve cache efficiency.
Common splitting strategies include:
Range‑based splitting (e.g., by time range) – simple to expand but may cause hotspot traffic on recent data.
Hash‑based splitting – evenly distributes load but requires data migration when scaling.
Middleware can automatically route queries to the appropriate database and table based on a chosen sharding key (e.g., user ID).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
