Databases 14 min read

Database Sharding: Principles, Methods, and Tools

The article explains why and when to apply database sharding, describes horizontal and vertical partitioning methods, lists common sharding tools, and discusses challenges such as distributed transactions, join avoidance, pagination, and global primary key generation.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Database Sharding: Principles, Methods, and Tools

When a database grows to a size that causes performance bottlenecks, sharding (splitting databases and tables) becomes a viable solution. The article first outlines the typical IO and CPU bottlenecks that increase active connections and can lead to service degradation.

IO Bottlenecks

Disk read IO: hot data exceeds cache, causing many reads; solution – horizontal sharding or vertical partitioning.

Network IO: excessive request volume exceeds bandwidth; solution – horizontal sharding.

CPU Bottlenecks

SQL inefficiencies (joins, GROUP BY, ORDER BY, non‑indexed queries); solution – SQL optimization, proper indexing, or moving calculations to the service layer.

Large single‑table scans; solution – horizontal partitioning.

Sharding Types

Horizontal Sharding (Horizontal Database) : split data across multiple databases based on a field (hash, range, etc.). All databases share the same schema, contain disjoint data, and their union equals the full dataset.

Horizontal Partitioning (Horizontal Table) : split a single table into multiple tables using similar strategies; each table has the same structure, disjoint rows, and together form the full dataset.

Vertical Sharding (Vertical Database) : split tables into different databases according to business modules; each database may have a different schema.

Vertical Partitioning (Vertical Table) : split a table into a main table and extension tables based on field activity; schemas differ, but a common key (usually the primary key) links the data.

Common Sharding Tools

Sharding‑JDBC (Dangdang)

TSharding (Mogujie)

Atlas (360)

Cobar (Alibaba)

MyCAT (based on Cobar)

Oceanus (58.com)

Vitess (Google)

Problems Introduced by Sharding

Distributed Transaction Consistency

Updates that span multiple shards require distributed transactions (e.g., XA protocol, two‑phase commit), which increase latency and risk of deadlocks.

Join Queries Across Shards

Cross‑shard joins become costly; recommended solutions include using global tables, field redundancy, data assembly in the service layer, or ER‑sharding to keep related tables on the same shard.

Cross‑Shard Pagination, Sorting, and Aggregation

Paging and ordering must be performed per shard and then merged, which can be CPU‑ and memory‑intensive for large page numbers. Aggregations (MAX, MIN, SUM, COUNT) also require per‑shard computation followed by a final merge.

Global Primary Key Duplication

Auto‑increment IDs are not globally unique across shards. Strategies include:

UUIDs – simple but storage‑heavy and index‑unfriendly.

Sequence table (MyISAM) – example below:

CREATE TABLE `sequence` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM;
REPLACE INTO sequence (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Snowflake algorithm – 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and sequence within a millisecond.

Data Migration and Scaling

When sharding is introduced, historical data must be read from the original tables and written to the appropriate shards according to the chosen rule. Capacity planning should aim for each shard to hold no more than ~10 million rows.

When to Consider Sharding

Only after exhausting other optimizations (hardware upgrades, indexing, read‑write splitting).

When table size or data volume severely impacts performance or maintenance (backups, DDL, lock contention).

When specific fields become hot and benefit from vertical splitting.

When rapid data growth approaches the limits of a single instance.

The article concludes with a reminder to avoid premature sharding and to weigh the added complexity against actual performance needs.

database shardingdistributed transactionshorizontal partitioningvertical shardingglobal primary keysharding tools
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.