When to Shard Your Database? A Practical Guide to Partitioning Strategies
This article explains database bottlenecks caused by IO and CPU limits, introduces horizontal and vertical sharding for databases and tables, compares popular sharding tools, discusses challenges such as distributed transactions, cross‑node joins, pagination and global ID generation, and offers guidance on when and how to apply sharding in real‑world systems.
Database Bottlenecks
IO and CPU bottlenecks increase active connections, leading to performance issues.
IO Bottleneck
Disk read IO : hot data exceeds cache, causing many reads – solve with horizontal sharding and vertical partitioning.
Network IO : excessive request data – solve with horizontal sharding.
CPU Bottleneck
SQL issues : joins, group by, order by, non‑indexed queries – optimize SQL, add indexes, move calculations to service layer.
Large single table : many rows scanned – use horizontal table sharding.
Sharding Types
Horizontal Database Sharding
Concept: split data across multiple databases based on a field using hash/range strategies.
Result: identical schema per DB, disjoint data, union equals full dataset.
Scenario: high concurrent load without clear business boundaries.
Analysis: more DBs reduce IO and CPU pressure.
Horizontal Table Sharding
Concept: split a single table into multiple tables using hash/range.
Result: identical table schema, disjoint rows, union equals full dataset.
Scenario: single table too large, affecting SQL efficiency and CPU.
Analysis: smaller tables improve query speed and reduce CPU load.
Vertical Database Sharding
Concept: separate tables into different databases according to business domains.
Result: different schemas per DB, disjoint data, union equals full dataset.
Scenario: high concurrency and clear module boundaries.
Analysis: enables service‑oriented architecture.
Vertical Table Sharding
Concept: split columns of a table into a main table and extension tables based on column activity.
Result: different table structures, some shared key (usually primary key), union equals full dataset.
Scenario: large rows with many columns, hot and cold data mixed, causing cache pressure and IO bottlenecks.
Analysis: keep hot columns in main table to improve cache hit rate and reduce random reads.
Sharding Tools
Sharding‑JDBC (Dangdang)
TSharding (Mogujie)
Atlas (360)
Cobar (Alibaba)
MyCAT (based on Cobar)
Oceanus (58.com)
Vitess (Google)
Problems Introduced by Sharding
Transaction Consistency
Cross‑database transactions require XA or two‑phase commit, increasing latency and risk of deadlocks.
Eventual Consistency
For high‑performance, low‑consistency needs, use compensation mechanisms such as data reconciliation or log‑based sync.
Cross‑Node Join Issues
Avoid joins across shards; use global tables, field redundancy, data assembly in service layer, or ER‑sharding to keep related data in the same shard.
Cross‑Node Pagination, Sorting, Aggregation
Require sorting on each shard then merging results; large page numbers heavily impact CPU and memory.
Global Primary Key Duplication
Strategies: UUID, a dedicated sequence table, or Snowflake algorithm.
CREATE TABLE `sequence` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`stub` char(1) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM;
REPLACE INTO sequence (stub) VALUES ('a');
SELECT LAST_INSERT_ID();Data Migration & Scaling
When traffic grows, migrate historical data according to sharding rules and plan capacity (suggested less than 10 million rows per shard).
When to Consider Sharding
Avoid premature sharding; only when data size or performance hits limits.
When large tables hinder backup, DDL, or cause lock contention.
When certain columns become hot and should be vertically split.
When rapid data growth approaches bottlenecks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
