Databases 14 min read

When to Shard Your Database? A Practical Guide to Partitioning Strategies

This article explains database bottlenecks caused by IO and CPU limits, introduces horizontal and vertical sharding for databases and tables, compares popular sharding tools, discusses challenges such as distributed transactions, cross‑node joins, pagination and global ID generation, and offers guidance on when and how to apply sharding in real‑world systems.

Java Backend Technology

Apr 26, 2020

When to Shard Your Database? A Practical Guide to Partitioning Strategies

Database Bottlenecks

IO and CPU bottlenecks increase active connections, leading to performance issues.

IO Bottleneck

Disk read IO : hot data exceeds cache, causing many reads – solve with horizontal sharding and vertical partitioning.

Network IO : excessive request data – solve with horizontal sharding.

CPU Bottleneck

SQL issues : joins, group by, order by, non‑indexed queries – optimize SQL, add indexes, move calculations to service layer.

Large single table : many rows scanned – use horizontal table sharding.

Sharding Types

Horizontal Database Sharding

Concept: split data across multiple databases based on a field using hash/range strategies.

Result: identical schema per DB, disjoint data, union equals full dataset.

Scenario: high concurrent load without clear business boundaries.

Analysis: more DBs reduce IO and CPU pressure.

Horizontal Table Sharding

Concept: split a single table into multiple tables using hash/range.

Result: identical table schema, disjoint rows, union equals full dataset.

Scenario: single table too large, affecting SQL efficiency and CPU.

Analysis: smaller tables improve query speed and reduce CPU load.

Vertical Database Sharding

Concept: separate tables into different databases according to business domains.

Result: different schemas per DB, disjoint data, union equals full dataset.

Scenario: high concurrency and clear module boundaries.

Analysis: enables service‑oriented architecture.

Vertical Table Sharding

Concept: split columns of a table into a main table and extension tables based on column activity.

Result: different table structures, some shared key (usually primary key), union equals full dataset.

Scenario: large rows with many columns, hot and cold data mixed, causing cache pressure and IO bottlenecks.

Analysis: keep hot columns in main table to improve cache hit rate and reduce random reads.

Sharding Tools

Sharding‑JDBC (Dangdang)

TSharding (Mogujie)

Atlas (360)

Cobar (Alibaba)

MyCAT (based on Cobar)

Oceanus (58.com)

Vitess (Google)

Problems Introduced by Sharding

Transaction Consistency

Cross‑database transactions require XA or two‑phase commit, increasing latency and risk of deadlocks.

Eventual Consistency

For high‑performance, low‑consistency needs, use compensation mechanisms such as data reconciliation or log‑based sync.

Cross‑Node Join Issues

Avoid joins across shards; use global tables, field redundancy, data assembly in service layer, or ER‑sharding to keep related data in the same shard.

Cross‑Node Pagination, Sorting, Aggregation

Require sorting on each shard then merging results; large page numbers heavily impact CPU and memory.

Global Primary Key Duplication

Strategies: UUID, a dedicated sequence table, or Snowflake algorithm.

CREATE TABLE `sequence` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM;
REPLACE INTO sequence (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Data Migration & Scaling

When traffic grows, migrate historical data according to sharding rules and plan capacity (suggested less than 10 million rows per shard).

When to Consider Sharding

Avoid premature sharding; only when data size or performance hits limits.

When large tables hinder backup, DDL, or cause lock contention.

When certain columns become hot and should be vertically split.

When rapid data growth approaches bottlenecks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Scalability database sharding Partitioning

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.