How to Scale Your Database: Sharding Strategies and Middleware Comparison
This article explains the difference between database sharding and table partitioning, illustrates scaling challenges with a fast‑growing startup scenario, and compares popular sharding middleware such as Cobar, TDDL, Atlas, ShardingSphere, and Mycat, offering practical recommendations for different company sizes.
The article explains the difference between database sharding (分库) and table partitioning (分表), using a fast‑growing startup scenario to show when scaling becomes necessary.
Initially, a company with 200,000 registered users and modest traffic can operate with a single database, but rapid growth to millions of users and thousands of concurrent requests quickly exhausts a single table’s capacity and disk space.
Table Partitioning (分表)
When a single table reaches tens of millions of rows, query performance degrades. Partitioning splits a large table into multiple smaller tables (e.g., by user ID) so each table stays within a manageable size, typically around 2 million rows.
Database Sharding (分库)
Sharding distributes data across multiple databases. A healthy single database should handle around 1,000 QPS; beyond that, splitting the data into several databases improves concurrency and reduces disk usage.
This is the essence of "分库分表".
Before Sharding
After Sharding
Concurrency Support
MySQL single‑node cannot handle high concurrency
MySQL scaled from single node to multiple nodes, greatly increasing concurrency
Disk Usage
Single‑node disk nearly full
Multiple databases reduce disk usage per node
SQL Performance
Large tables make SQL slow
Smaller tables improve SQL execution efficiency
Common Sharding Middleware
Cobar
TDDL
Atlas
Sharding‑jdbc (now ShardingSphere)
Mycat
Cobar
Developed by Alibaba B2B team, a proxy‑layer solution that parses SQL and routes it to MySQL clusters. It is largely unmaintained, lacks read/write splitting, stored procedures, cross‑database joins, and pagination.
TDDL
Created by the Taobao team, a client‑layer solution supporting basic CRUD and read/write splitting but not joins or multi‑table queries; it depends on Taobao's Diamond configuration system.
Atlas
360’s open‑source proxy solution, once used by some companies but the project has not been updated for years.
Sharding‑jdbc / ShardingSphere
Originally open‑sourced by Dangdang, now called ShardingSphere. It supports sharding, read/write splitting, distributed ID generation, and flexible transactions. The latest release (4.0.0‑RC1) is actively maintained and considered a viable choice.
Mycat
Based on Cobar, a proxy‑layer solution with comprehensive features and an active community. It requires deployment and operational overhead but offers transparent sharding for applications.
Summary
For most cases, Sharding‑jdbc (client‑side) and Mycat (proxy‑side) are the two primary options. Sharding‑jdbc is lightweight, requires no extra deployment, and offers high performance, but upgrades require changes in every client. Mycat needs its own deployment and higher operational cost, yet it provides transparent sharding for all projects.
Recommendation: Small‑to‑medium companies should prefer Sharding‑jdbc for its simplicity and low maintenance, while large enterprises with many projects and ample ops resources may benefit from Mycat’s proxy architecture.
How to Perform Database Splitting?
Horizontal Sharding distributes rows of a table across multiple databases/tables with identical schemas, increasing concurrency and storage capacity.
Vertical Sharding splits a wide table into multiple tables or databases, each containing a subset of columns, typically separating frequently accessed fields from less‑used ones to improve cache efficiency.
Table‑level splitting (分表) further breaks a large table into N smaller tables to keep each table’s row count within a manageable range (e.g., 1–5 million rows), ensuring consistent SQL performance.
Sharding middleware can automatically route queries based on a chosen sharding key (e.g., user_id) to the appropriate database and table.
Two common sharding strategies are:
Range‑based sharding (e.g., by time period), simple to expand but may cause hotspot traffic on recent data.
Hash‑based sharding, which evenly distributes data and load but requires data migration when scaling.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
