Databases 19 min read

Mastering Sharding: When and How to Split Databases and Tables with ShardingSphere

This article explains the fundamentals of database sharding, why and when to apply it, various vertical and horizontal splitting strategies, routing algorithms, common pitfalls such as pagination and distributed transactions, and compares client‑side and proxy‑based ShardingSphere deployment patterns.

ITPUB
ITPUB
ITPUB
Mastering Sharding: When and How to Split Databases and Tables with ShardingSphere

What is Sharding (分库分表)

Sharding splits a large database into multiple databases and tables to avoid performance degradation caused by a single oversized instance. It consists of two independent concepts: 分库 (database splitting) and 分表 (table splitting), which are usually applied together.

Why Sharding

Single‑node databases have limited storage and connection capacity. When a table reaches millions of rows, adding read replicas or tuning indexes is insufficient; sharding reduces load and speeds up queries.

Reasons to Split Databases

Capacity – Disk space is fixed; rapid data growth quickly exceeds a single node.

Connection Count – High concurrency can exhaust max_connections. Example query:

show variables like '%max_connections%';

Reasons to Split Tables

Large tables cause slow queries due to deep B‑tree indexes in InnoDB. Alibaba recommends sharding when a table exceeds 5 million rows or 2 GB, though many systems operate with tens of millions of rows without sharding.

When to Shard

Sharding becomes necessary when data volume continuously grows and traditional optimizations (indexing, read replicas) no longer improve performance. A practical threshold is a MySQL table approaching the hundred‑million‑row scale.

Sharding Mechanisms

Sharding can be performed vertically (by business domain) or horizontally (by data range or hash).

Vertical Splitting

Vertical Database Splitting – Separate databases by business function (order, payment, member, points). Each service accesses its own DB via APIs, aligning with micro‑service principles.

Vertical Table Splitting – Move rarely used columns to separate tables. Example: split t_order amount‑related fields into t_order_price_expansion linked by order_no. This reduces column count per table, allowing more data to fit in memory and improving index hit rates.

Horizontal Splitting

Horizontal Database Sharding – Duplicate the same table across multiple databases (e.g., db_order_1, db_order_2) and route rows based on order_no % N, where N is the number of databases.

Horizontal Table Sharding – Within a single database, split a large table into identically structured shards (e.g., t_order_1, t_order_2, t_order_3) each holding a subset of rows. This reduces per‑table size but still shares the same DB instance.

Data Placement Algorithms

Routing rules decide which database and table a row belongs to. Common algorithms:

Modulo (hash % N)

Range (by time or ID range)

Range + Modulo (range to select a DB, then modulo to select a table)

Geographic (by region or city)

Predefined (fixed mapping when shard count is known)

1. Modulo Algorithm

Hash the sharding key (e.g., order_no) and take the remainder modulo the number of shards. The remainder determines the target DB/table. Advantages: simple and evenly distributes data. Disadvantages: scaling out requires re‑hashing; node failure changes the modulo base, causing data to map to different shards.

2. Range Algorithm

Split by a continuous field such as time or ID range. Example: t_user_1 stores IDs 1‑10 million, t_user_2 stores 10‑20 million, etc. Advantages: controllable table size and easy horizontal scaling. Disadvantages: possible hotspot if a specific range receives a burst of writes.

3. Range + Modulo Algorithm

First apply a range to assign a user to a specific database, then use modulo within that DB to select the exact table, mitigating hotspot risk while keeping distribution even.

4. Geographic Sharding

Assign shards based on region or city (e.g., East China vs. North China).

5. Predefined Algorithm

When the number of shards is known in advance, data can be routed directly to a fixed DB/table.

Challenges After Sharding

Pagination, Sorting, Cross‑Node Joins – Aggregating results from multiple shards requires additional merging logic.

Distributed Transactions – Cross‑shard operations need coordination; solutions include Alibaba’s Seata and MySQL’s XA protocol.

Global Unique Primary Keys – Auto‑increment IDs may collide across shards; a distributed ID generator (snowflake) is required.

Governance of Many Shards – Large systems can have thousands of shard tables, making manual management impractical.

Historical Data Migration – Moving existing data to a sharded architecture involves both full‑load and incremental sync, which can be complex.

Sharding Architecture Patterns

Two main deployment modes exist:

Client Mode

The application embeds sharding logic, connects directly to multiple databases, and performs local result aggregation. This mode offers slightly better performance because it eliminates an extra network hop.

Proxy Mode

A proxy service sits between the application and MySQL, handling the MySQL protocol, routing SQL to the appropriate shard, and returning unified results. It centralizes SQL throttling, permission control, monitoring, and alerting.

Comparison

Performance – Client mode is marginally faster; proxy adds a hop (app → proxy → MySQL).

Complexity – Client mode requires adding a library to each service; proxy mode requires deploying and maintaining a separate, highly‑available service.

Upgrade Impact – Client mode upgrades affect every dependent application; proxy mode upgrades only the proxy cluster, minimizing downstream impact.

Governance & Monitoring – Proxy mode provides centralized management, while client mode distributes these concerns across each application instance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlShardingSpheredatabase partitioning
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.