Databases 24 min read

When and How to Shard Databases: A Practical Guide to Splitting Tables and Schemas

This article explains why relational databases hit performance bottlenecks at large scale, introduces vertical and horizontal sharding techniques, compares their pros and cons, discusses common challenges such as distributed transactions, joins, pagination and global key generation, and offers practical solutions and middleware options.

dbaplus Community
dbaplus Community
dbaplus Community
When and How to Shard Databases: A Practical Guide to Splitting Tables and Schemas

Why Sharding Is Needed

Relational databases become bottlenecks when a single node reaches limits in storage, connections, or CPU. Once a table grows beyond 10 million rows or 100 GB, even read‑replicas and index tuning may not prevent severe performance degradation, prompting the need to split data.

Core Concept: Data Sharding

Sharding distributes data across multiple databases so each node holds a smaller subset, reducing load and improving query latency. Two main sharding strategies exist:

1. Vertical (Column‑wise) Sharding

Vertical sharding separates data by business domains or column groups.

Vertical database split : Low‑coupled tables are placed in different databases, similar to micro‑service isolation.

Vertical table split : Frequently accessed or large columns are moved to an auxiliary table, reducing row size and improving cache hit rates.

Advantages

Clear business boundaries and easier micro‑service governance.

Improved I/O and connection scalability for high‑concurrency workloads.

Disadvantages

Cross‑database joins become impossible, requiring additional aggregation logic.

Distributed transactions are more complex.

Large tables may still need horizontal sharding.

2. Horizontal (Row‑wise) Sharding

Horizontal sharding splits rows of a single logical table across multiple databases or tables.

Two common patterns:

Range‑based sharding : Rows are allocated according to a numeric range (e.g., user ID 0‑9,999,999 goes to DB‑1).

Modulo‑based sharding : Rows are assigned by id % N, distributing data evenly across N shards.

Advantages

No single shard exceeds size limits, enabling linear scalability.

Application changes are minimal; business logic stays unchanged.

Disadvantages

Cross‑shard transaction consistency is hard.

Cross‑shard joins are costly.

Data migration and rebalancing can be complex.

Typical Sharding Rules

Range partitioning by date or ID.

Modulo partitioning for uniform distribution.

Cold‑hot data separation, moving infrequently accessed data to separate shards.

Challenges Introduced by Sharding

1. Distributed Transactions

Updates spanning multiple shards require XA or two‑phase commit, which adds latency and increases the risk of deadlocks as node count grows.

2. Eventual Consistency

Systems tolerant of slight consistency delays can use compensation transactions, periodic reconciliation, or log‑based sync instead of immediate rollback.

3. Cross‑Shard Joins

Joins become impractical; common mitigations include:

Duplicating small reference tables (global tables) in every shard.

Denormalizing data by storing redundant columns.

Two‑step query: fetch IDs from one shard, then retrieve full rows from the appropriate shard.

ER‑aware sharding: keep tables with strong relationships on the same shard.

4. Pagination, Sorting, and Aggregation

When sorting by a non‑shard key, each shard must sort locally, then a merge‑sort is performed centrally, which can be CPU‑ and memory‑intensive for large page numbers. Aggregations (MAX, SUM, COUNT) follow the same pattern: compute locally, then combine results.

5. Global Primary Key Generation

Auto‑increment IDs are unsuitable across shards. Common solutions:

UUID (simple but large and index‑unfriendly).

Central sequence table with a stub column to generate unique IDs.

Multi‑node ID generators (e.g., Flickr’s dual‑DB approach) that allocate distinct auto‑increment ranges.

Snowflake algorithm: 64‑bit IDs composed of timestamp, datacenter, worker, and sequence bits, providing time‑ordered, collision‑free IDs.

When to Consider Sharding

Only shard when necessary; first try hardware upgrades, read/write splitting, and index tuning.

When table size or QPS causes backup, DDL, or lock contention problems.

When specific columns grow rapidly and can be vertically split.

When overall data volume threatens future scalability.

To improve availability: isolate failures by distributing data across multiple nodes.

Case Study: User Center

A typical user service stores User(uid, login_name, passwd, …). As the user base grows from 100 k to 1 billion, the last_login_time column becomes a hotspot. Vertical splitting creates user_time for login timestamps and user_ext for rarely accessed profile data. Horizontal sharding then distributes users across shards by range or modulo of uid.

For non‑UID lookups (e.g., login_name), a mapping table or cache from login_name → uid is introduced, or a “gene” function derives the shard from the attribute.

Sharding Middleware Options

sharding‑jdbc (Alibaba)

TSharding (Mogujie)

Atlas (360)

Cobar / MyCAT (Alibaba)

Oceanus (58.com)

Vitess (Google)

These open‑source projects provide transparent routing, SQL rewriting, and connection pooling for both vertical and horizontal sharding scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingvertical shardinghorizontal shardingdatabase partitioningglobal primary key
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.