Mastering Database Sharding: When and How to Split Databases and Tables
This article explains the concepts, motivations, strategies, and challenges of database and table sharding, covering vertical and horizontal partitioning, range and hash algorithms, when to adopt them, and popular middleware solutions for scalable MySQL architectures.
Preface
Hello everyone, today we discuss database sharding.
What is sharding
Why sharding is needed
How to shard
When to consider sharding
Problems caused by sharding
Sharding middleware overview
1. What is sharding
Database splitting : dividing a single database into multiple databases deployed on different machines.
Table splitting : dividing a single table into multiple tables.
2. Why sharding is needed
2.1 Why split databases
When traffic surges, a single MySQL instance may hit storage or connection limits. Splitting databases reduces disk usage and distributes concurrent connections, which aligns with micro‑service architectures that separate order, user, and product data into distinct databases.
2.2 Why split tables
Large tables cause slow queries, especially when indexes are not hit or when a table exceeds ten million rows. B+‑tree height grows, increasing disk reads. Splitting tables keeps each table size manageable and maintains query performance.
InnoDB stores data in 16 KB pages. For a leaf node storing 1 KB rows, each leaf can hold 16 rows. Assuming an 8‑byte bigint primary key and a 6‑byte pointer, a non‑leaf node can hold about 1 170 pointers (16 KB/14 B). Thus a B+‑tree of height 2 can store roughly 18 720 rows, while height 3 can store about 21 902 400 rows. Beyond this, query speed degrades.
3. How to shard
3.1 Vertical partitioning
Vertical splitting separates different functional modules into separate databases (e.g., user, order, points, product). This reduces pressure on a single database and supports high‑concurrency scenarios.
Vertical table splitting moves rarely used or large columns (e.g., email, address, description) to a separate detail table, reducing I/O for common queries.
3.2 Horizontal partitioning
3.2.1 Horizontal database splitting
Data is distributed across multiple database servers, each holding the same schema but different data subsets, alleviating single‑node bottlenecks.
3.2.2 Horizontal table splitting
Data is divided into multiple tables based on a rule such as hash modulo or range. For example, an order table can be split by time range into separate tables.
3.3 Sharding strategies
3.3.1 Range
Data is partitioned by a numeric or time range (e.g., IDs 0‑10 million in one table, 10‑20 million in another). This simplifies scaling but can create hotspot tables.
3.3.2 Hash modulo
Rows are assigned to tables based on key % tableCount. This distributes load evenly and avoids hotspots, but expanding the number of tables requires data migration.
3.3.3 Range + Hash
Combine range and hash: first split by range (e.g., 0‑40 million, 40‑80 million) into separate databases, then apply hash modulo within each database to distribute tables, balancing scalability and hotspot avoidance.
4. When to consider sharding
4.1 When to split tables
If a table reaches tens of millions of rows and query performance degrades (B+‑tree height exceeds 3), or if daily order volume grows to hundreds of thousands, table sharding should be planned. Industry practice suggests considering sharding around 5 million rows.
4.2 When to split databases
When multiple services share a single database and it becomes a performance bottleneck, separate databases per domain (order, user, etc.) should be created, often as part of a micro‑service transition.
5. Problems introduced by sharding
Transaction management – local transactions become ineffective, requiring distributed transactions.
Cross‑database joins – need to be performed in multiple steps.
Sorting and aggregation – must be merged at the application layer.
Pagination – either aggregate results then paginate, or let the front‑end handle pagination across nodes.
Distributed IDs – cannot rely on auto‑increment; use UUIDs or Snowflake algorithm.
6. Sharding middleware
Popular open‑source solutions include:
cobar
Mycat
Sharding‑JDBC
Atlas
TDDL (Taobao)
Vitess
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
