When and How to Shard Your Database: Strategies, Tools, and Pitfalls
This article explains why large databases hit IO or CPU bottlenecks, outlines horizontal and vertical sharding methods for databases, tables, and columns, compares popular sharding tools, and discusses the new challenges such as distributed transactions, cross‑node joins, pagination, and global primary‑key generation, plus migration and capacity‑planning advice.
Database Bottlenecks
When a database grows too large, IO or CPU limits increase active connections until they approach the maximum the database can handle, causing service‑level failures such as reduced concurrency, throughput, or crashes.
IO Bottleneck
Disk‑read overload caused by hot data that cannot fit in cache; each query generates heavy IO → consider horizontal database sharding or vertical table sharding.
Network‑IO overload when request volume exceeds bandwidth → horizontal database sharding.
CPU Bottleneck
Complex SQL (joins, GROUP BY, ORDER BY, non‑indexed filters) increases CPU usage → optimize SQL, add proper indexes, or move calculations to the service layer.
Very large single tables cause full‑table scans → horizontal table sharding.
Sharding Strategies
Horizontal Database Sharding
Concept: split rows of a single logical database into multiple physical databases based on a field using strategies such as hash or range.
All databases share the same schema.
Data sets are disjoint; the union of all databases equals the full dataset.
Scenario: high absolute concurrency without clear business boundaries for vertical sharding.
Analysis: more databases reduce IO and CPU pressure.
Horizontal Table Sharding
Concept: split a single logical table into multiple physical tables using a field‑based strategy.
All tables share the same structure.
Data sets are disjoint; the union of all tables equals the full dataset.
Scenario: a single table becomes too large, degrading SQL efficiency and increasing CPU load.
Analysis: smaller tables improve query performance and lower CPU usage.
Vertical Database Sharding
Concept: group tables by business domain and place each group into a separate database.
Each database may have a different schema.
Data sets are disjoint; the union of all databases equals the full dataset.
Scenario: absolute concurrency is high and business modules can be isolated.
Analysis: enables service‑oriented architecture; shared configuration or dictionary tables can be moved to dedicated databases.
Vertical Table (Column) Sharding
Concept: split a wide table into a main table and one or more extension tables based on column activity.
Each table has a different structure.
Tables share at least a primary‑key column for joining.
The union of all tables equals the full dataset.
Scenario: large rows contain many rarely accessed columns, causing cache pressure and random‑read IO.
Analysis: keep hot columns in the main table to improve cache hit rate; retrieve extension tables in the service layer and join in memory, avoiding cross‑node SQL joins.
Sharding Tools
Sharding‑JDBC
TSharding
Atlas
Cobar
MyCAT
Oceanus
Vitess
Each tool has its own trade‑offs; evaluate based on your environment.
Problems Introduced by Sharding
Distributed Transaction Consistency
Updates that span multiple shards require distributed transactions (e.g., XA protocol, two‑phase commit), which increase latency and risk deadlocks as the number of nodes grows.
Eventual Consistency
For systems tolerant of slight delays, compensate transactions after failures instead of immediate rollback.
Cross‑Node Join Issues
After sharding, data that used to be joined across tables may reside on different nodes; avoid joins or use alternatives such as global tables, field redundancy, data assembly in the service layer, or ER‑sharding (store related tables in the same shard).
Pagination, Sorting, and Function Aggregation
Limit/offset pagination and order‑by across shards require sorting each shard’s result set and merging them, which can be CPU‑ and memory‑intensive for large page numbers. Aggregate functions (MAX, MIN, SUM, COUNT) must be computed per shard and then combined.
Global Primary‑Key Duplication
Auto‑increment IDs are not unique across shards. Strategies:
UUID – simple but large and index‑unfriendly.
Sequence table – a dedicated table with a unique stub column; generate IDs via
REPLACE INTO sequence (stub) VALUES ('a'); SELECT LAST_INSERT_ID();. Uses MyISAM for table‑level locking.
Snowflake – 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and sequence counter.
Data Migration & Capacity Planning
When adopting sharding, migrate existing data by reading from the old store and writing to target shards according to the chosen rule. Estimate required shard count based on current data volume, QPS, and growth rate; a common guideline is to keep each shard’s single‑table size below 10 million rows.
When to Consider Sharding
Do not shard prematurely; first exhaust hardware upgrades, read/write separation, and index optimization.
Sharding is justified when a single table reaches performance limits or when absolute concurrency spikes.
Consider vertical column sharding when a table contains many rarely used fields that inflate row size and cause cache pressure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
