Databases 14 min read

Why and How to Perform Database Sharding and Table Partitioning

Database sharding and table partitioning are essential scaling techniques that separate data across multiple databases and tables to handle growing user volume, data size, and concurrency, with explanations of when to split, methods (horizontal vs vertical), and a comparison of popular middleware solutions.

Architect's Guide
Architect's Guide
Architect's Guide
Why and How to Perform Database Sharding and Table Partitioning

Why Split Database and Table?

In short, sharding (splitting databases) and table partitioning are two different concepts and should not be confused; you may split only databases, only tables, or both.

Imagine a small startup that initially has 200,000 registered users, 10,000 daily active users, and each table receives about 1,000 new rows per day. The system can be handled by a few engineers and a single MySQL instance.

After rapid growth, the user base expands to 20 million registered users and 1 million daily active users, with each table receiving 100,000 new rows per day and peak concurrency reaching 1,000 requests per second. The single‑node MySQL starts to feel the pressure.

When the user count further explodes to 100 million and daily active users reach 10 million, a single table can contain 20–30 million rows, and peak concurrency climbs to 5,000–8,000 QPS, the database can no longer sustain the load.

Table Partitioning

When a single table reaches tens of millions of rows, query performance degrades sharply. As a rule of thumb, once a table exceeds a few hundred thousand rows, you should consider partitioning it.

Table partitioning means moving rows of a large table into multiple smaller tables, usually based on a key such as user ID, so that each partition contains at most a few million rows (e.g., 2 million). This keeps each table’s size manageable and improves SQL execution speed.

Database Sharding

Database sharding means distributing data across multiple physical databases. A single healthy MySQL instance can comfortably handle around 1,000–2,000 QPS; beyond that you need to add more instances. By splitting data into several databases, you increase both concurrency capacity and storage.

Together, sharding and partitioning form the classic "split database and table" strategy.

Before Sharding/Partitioning

After Sharding/Partitioning

Concurrency Support

Single‑machine MySQL cannot handle high concurrency

Multiple MySQL instances increase concurrency manyfold

Disk Usage

Single‑machine disk space nearly full

Data spread across several machines, disk usage drops

SQL Performance

Large tables make SQL slower

Smaller tables improve SQL execution efficiency

Common Sharding Middleware

The following middleware solutions are widely used for database sharding and table partitioning:

Cobar

TDDL

Atlas

Sharding‑jdbc (now called ShardingSphere )

Mycat

Cobar

Developed by Alibaba's B2B team, Cobar is a proxy‑layer solution that sits between the application and MySQL. It parses SQL, applies sharding rules, and forwards the query to the appropriate MySQL instance. It is now rarely updated and lacks features such as read/write splitting, stored procedures, cross‑database joins, and pagination.

TDDL

Created by the Taobao team, TDDL is a client‑side solution that supports basic CRUD and read/write splitting but does not support joins or multi‑table queries. It also depends on Alibaba's Diamond configuration service.

Atlas

Open‑sourced by 360, Atlas is another proxy‑layer product. Its community has not been active for several years, so adoption is low.

Sharding‑jdbc / ShardingSphere

Originally open‑sourced by Dangdang, Sharding‑jdbc is a client‑side library that has evolved into ShardingSphere . It supports sharding, read/write splitting, distributed ID generation, and flexible transactions. As of version 4.0.0‑RC1 (released April 2019), it remains actively maintained and is a popular choice.

Mycat

Mycat is a proxy‑layer middleware derived from Cobar. It offers a rich feature set and has an active community, making it a hot choice for many companies, although it is newer than Sharding‑jdbc and has undergone less real‑world testing.

Summary of Middleware Choices

For most scenarios, Sharding‑jdbc (client‑side) and Mycat (proxy‑side) are the two recommended options.

Sharding‑jdbc’s advantages include zero deployment overhead, low operational cost, and high performance because it avoids an extra proxy layer. However, any version upgrade requires updating every client that uses it, creating tight coupling.

Mycat requires a separate deployment and higher operational effort, but it is transparent to applications and centralizes sharding logic, which is beneficial for large organizations with many projects.

In practice, small‑to‑medium companies often prefer Sharding‑jdbc for its simplicity, while large enterprises may choose Mycat to handle complex, high‑traffic environments.

How to Split a Database

Horizontal Splitting copies the same schema across multiple databases and distributes rows based on a sharding key, thereby increasing both storage capacity and concurrency.

Vertical Splitting separates a wide table into multiple tables (or databases) with different column sets, typically moving frequently accessed columns to one table and less‑used columns to another, which improves cache efficiency.

Additionally, table‑level splitting (i.e., creating N tables from one large table) keeps each table’s row count within a manageable range (commonly around 2 million rows) to maintain query performance.

Both sharding and partitioning can be driven by middleware that automatically routes queries based on a chosen key (e.g., user_id ).

Two common sharding strategies are:

Range‑based sharding: each database stores a continuous range of values (often time‑based). It simplifies scaling but can create hotspot problems because recent data receives most traffic.

Hash‑based sharding: a hash function distributes rows evenly across databases, balancing load but making scaling more complex due to data migration.

Range sharding is easy to extend by adding a new database for each new time period, whereas hash sharding offers better load distribution at the cost of more involved re‑sharding when capacity grows.

middlewaredatabase shardinghorizontal scalingtable partitioningvertical splitting
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.