Databases 12 min read

How to Scale Your Database: Sharding Strategies and Middleware Comparison

This article explains the difference between database sharding and table partitioning, illustrates scaling challenges with a fast‑growing startup scenario, and compares popular sharding middleware such as Cobar, TDDL, Atlas, ShardingSphere, and Mycat, offering practical recommendations for different company sizes.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How to Scale Your Database: Sharding Strategies and Middleware Comparison

The article explains the difference between database sharding (分库) and table partitioning (分表), using a fast‑growing startup scenario to show when scaling becomes necessary.

Initially, a company with 200,000 registered users and modest traffic can operate with a single database, but rapid growth to millions of users and thousands of concurrent requests quickly exhausts a single table’s capacity and disk space.

Table Partitioning (分表)

When a single table reaches tens of millions of rows, query performance degrades. Partitioning splits a large table into multiple smaller tables (e.g., by user ID) so each table stays within a manageable size, typically around 2 million rows.

Database Sharding (分库)

Sharding distributes data across multiple databases. A healthy single database should handle around 1,000 QPS; beyond that, splitting the data into several databases improves concurrency and reduces disk usage.

This is the essence of "分库分表".

Before Sharding

After Sharding

Concurrency Support

MySQL single‑node cannot handle high concurrency

MySQL scaled from single node to multiple nodes, greatly increasing concurrency

Disk Usage

Single‑node disk nearly full

Multiple databases reduce disk usage per node

SQL Performance

Large tables make SQL slow

Smaller tables improve SQL execution efficiency

Common Sharding Middleware

Cobar

TDDL

Atlas

Sharding‑jdbc (now ShardingSphere)

Mycat

Cobar

Developed by Alibaba B2B team, a proxy‑layer solution that parses SQL and routes it to MySQL clusters. It is largely unmaintained, lacks read/write splitting, stored procedures, cross‑database joins, and pagination.

TDDL

Created by the Taobao team, a client‑layer solution supporting basic CRUD and read/write splitting but not joins or multi‑table queries; it depends on Taobao's Diamond configuration system.

Atlas

360’s open‑source proxy solution, once used by some companies but the project has not been updated for years.

Sharding‑jdbc / ShardingSphere

Originally open‑sourced by Dangdang, now called ShardingSphere. It supports sharding, read/write splitting, distributed ID generation, and flexible transactions. The latest release (4.0.0‑RC1) is actively maintained and considered a viable choice.

Mycat

Based on Cobar, a proxy‑layer solution with comprehensive features and an active community. It requires deployment and operational overhead but offers transparent sharding for applications.

Summary

For most cases, Sharding‑jdbc (client‑side) and Mycat (proxy‑side) are the two primary options. Sharding‑jdbc is lightweight, requires no extra deployment, and offers high performance, but upgrades require changes in every client. Mycat needs its own deployment and higher operational cost, yet it provides transparent sharding for all projects.

Recommendation: Small‑to‑medium companies should prefer Sharding‑jdbc for its simplicity and low maintenance, while large enterprises with many projects and ample ops resources may benefit from Mycat’s proxy architecture.

How to Perform Database Splitting?

Horizontal Sharding distributes rows of a table across multiple databases/tables with identical schemas, increasing concurrency and storage capacity.

Vertical Sharding splits a wide table into multiple tables or databases, each containing a subset of columns, typically separating frequently accessed fields from less‑used ones to improve cache efficiency.

Table‑level splitting (分表) further breaks a large table into N smaller tables to keep each table’s row count within a manageable range (e.g., 1–5 million rows), ensuring consistent SQL performance.

Sharding middleware can automatically route queries based on a chosen sharding key (e.g., user_id) to the appropriate database and table.

Two common sharding strategies are:

Range‑based sharding (e.g., by time period), simple to expand but may cause hotspot traffic on recent data.

Hash‑based sharding, which evenly distributes data and load but requires data migration when scaling.

MiddlewareMySQLShardingSpheredatabase shardingTable PartitioningMyCat
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.