Databases 13 min read

Mastering Database Sharding: When and How to Split Databases and Tables

This article explains the concepts, motivations, strategies, and challenges of database and table sharding, covering vertical and horizontal partitioning, range and hash algorithms, when to adopt them, and popular middleware solutions for scalable MySQL architectures.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Mastering Database Sharding: When and How to Split Databases and Tables

Preface

Hello everyone, today we discuss database sharding.

What is sharding

Why sharding is needed

How to shard

When to consider sharding

Problems caused by sharding

Sharding middleware overview

1. What is sharding

Database splitting : dividing a single database into multiple databases deployed on different machines.

Table splitting : dividing a single table into multiple tables.

2. Why sharding is needed

2.1 Why split databases

When traffic surges, a single MySQL instance may hit storage or connection limits. Splitting databases reduces disk usage and distributes concurrent connections, which aligns with micro‑service architectures that separate order, user, and product data into distinct databases.

2.2 Why split tables

Large tables cause slow queries, especially when indexes are not hit or when a table exceeds ten million rows. B+‑tree height grows, increasing disk reads. Splitting tables keeps each table size manageable and maintains query performance.

InnoDB stores data in 16 KB pages. For a leaf node storing 1 KB rows, each leaf can hold 16 rows. Assuming an 8‑byte bigint primary key and a 6‑byte pointer, a non‑leaf node can hold about 1 170 pointers (16 KB/14 B). Thus a B+‑tree of height 2 can store roughly 18 720 rows, while height 3 can store about 21 902 400 rows. Beyond this, query speed degrades.

3. How to shard

3.1 Vertical partitioning

Vertical splitting separates different functional modules into separate databases (e.g., user, order, points, product). This reduces pressure on a single database and supports high‑concurrency scenarios.

Vertical table splitting moves rarely used or large columns (e.g., email, address, description) to a separate detail table, reducing I/O for common queries.

3.2 Horizontal partitioning

3.2.1 Horizontal database splitting

Data is distributed across multiple database servers, each holding the same schema but different data subsets, alleviating single‑node bottlenecks.

3.2.2 Horizontal table splitting

Data is divided into multiple tables based on a rule such as hash modulo or range. For example, an order table can be split by time range into separate tables.

3.3 Sharding strategies

3.3.1 Range

Data is partitioned by a numeric or time range (e.g., IDs 0‑10 million in one table, 10‑20 million in another). This simplifies scaling but can create hotspot tables.

3.3.2 Hash modulo

Rows are assigned to tables based on key % tableCount. This distributes load evenly and avoids hotspots, but expanding the number of tables requires data migration.

3.3.3 Range + Hash

Combine range and hash: first split by range (e.g., 0‑40 million, 40‑80 million) into separate databases, then apply hash modulo within each database to distribute tables, balancing scalability and hotspot avoidance.

4. When to consider sharding

4.1 When to split tables

If a table reaches tens of millions of rows and query performance degrades (B+‑tree height exceeds 3), or if daily order volume grows to hundreds of thousands, table sharding should be planned. Industry practice suggests considering sharding around 5 million rows.

4.2 When to split databases

When multiple services share a single database and it becomes a performance bottleneck, separate databases per domain (order, user, etc.) should be created, often as part of a micro‑service transition.

5. Problems introduced by sharding

Transaction management – local transactions become ineffective, requiring distributed transactions.

Cross‑database joins – need to be performed in multiple steps.

Sorting and aggregation – must be merged at the application layer.

Pagination – either aggregate results then paginate, or let the front‑end handle pagination across nodes.

Distributed IDs – cannot rely on auto‑increment; use UUIDs or Snowflake algorithm.

6. Sharding middleware

Popular open‑source solutions include:

cobar

Mycat

Sharding‑JDBC

Atlas

TDDL (Taobao)

Vitess

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqldatabase shardinghorizontal scalingPartitioningvertical splitting
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.