Database Sharding: Principles, Methods, and Tools
The article explains why and when to apply database sharding, describes horizontal and vertical partitioning methods, lists common sharding tools, and discusses challenges such as distributed transactions, join avoidance, pagination, and global primary key generation.
When a database grows to a size that causes performance bottlenecks, sharding (splitting databases and tables) becomes a viable solution. The article first outlines the typical IO and CPU bottlenecks that increase active connections and can lead to service degradation.
IO Bottlenecks
Disk read IO: hot data exceeds cache, causing many reads; solution – horizontal sharding or vertical partitioning.
Network IO: excessive request volume exceeds bandwidth; solution – horizontal sharding.
CPU Bottlenecks
SQL inefficiencies (joins, GROUP BY, ORDER BY, non‑indexed queries); solution – SQL optimization, proper indexing, or moving calculations to the service layer.
Large single‑table scans; solution – horizontal partitioning.
Sharding Types
Horizontal Sharding (Horizontal Database) : split data across multiple databases based on a field (hash, range, etc.). All databases share the same schema, contain disjoint data, and their union equals the full dataset.
Horizontal Partitioning (Horizontal Table) : split a single table into multiple tables using similar strategies; each table has the same structure, disjoint rows, and together form the full dataset.
Vertical Sharding (Vertical Database) : split tables into different databases according to business modules; each database may have a different schema.
Vertical Partitioning (Vertical Table) : split a table into a main table and extension tables based on field activity; schemas differ, but a common key (usually the primary key) links the data.
Common Sharding Tools
Sharding‑JDBC (Dangdang)
TSharding (Mogujie)
Atlas (360)
Cobar (Alibaba)
MyCAT (based on Cobar)
Oceanus (58.com)
Vitess (Google)
Problems Introduced by Sharding
Distributed Transaction Consistency
Updates that span multiple shards require distributed transactions (e.g., XA protocol, two‑phase commit), which increase latency and risk of deadlocks.
Join Queries Across Shards
Cross‑shard joins become costly; recommended solutions include using global tables, field redundancy, data assembly in the service layer, or ER‑sharding to keep related tables on the same shard.
Cross‑Shard Pagination, Sorting, and Aggregation
Paging and ordering must be performed per shard and then merged, which can be CPU‑ and memory‑intensive for large page numbers. Aggregations (MAX, MIN, SUM, COUNT) also require per‑shard computation followed by a final merge.
Global Primary Key Duplication
Auto‑increment IDs are not globally unique across shards. Strategies include:
UUIDs – simple but storage‑heavy and index‑unfriendly.
Sequence table (MyISAM) – example below:
CREATE TABLE `sequence` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`stub` char(1) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM; REPLACE INTO sequence (stub) VALUES ('a');
SELECT LAST_INSERT_ID();Snowflake algorithm – 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and sequence within a millisecond.
Data Migration and Scaling
When sharding is introduced, historical data must be read from the original tables and written to the appropriate shards according to the chosen rule. Capacity planning should aim for each shard to hold no more than ~10 million rows.
When to Consider Sharding
Only after exhausting other optimizations (hardware upgrades, indexing, read‑write splitting).
When table size or data volume severely impacts performance or maintenance (backups, DDL, lock contention).
When specific fields become hot and benefit from vertical splitting.
When rapid data growth approaches the limits of a single instance.
The article concludes with a reminder to avoid premature sharding and to weigh the added complexity against actual performance needs.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.