Database Bottlenecks and Sharding Strategies: IO, CPU, and Partitioning Solutions
This article analyzes common database performance bottlenecks such as I/O and CPU limits, explains horizontal and vertical sharding techniques, reviews available sharding tools, discusses the challenges of distributed transactions, cross‑node joins, pagination, and global primary‑key generation, and offers practical mitigation strategies.
Database Bottlenecks
Both I/O and CPU bottlenecks can increase the number of active database connections, eventually reaching the maximum connection threshold and causing service outages, reduced throughput, and crashes.
IO Bottleneck
Disk read I/O bottleneck: hot data exceeds cache capacity, causing heavy read I/O; solution – sharding databases and vertical partitioning.
Network I/O bottleneck: request volume exceeds bandwidth; solution – database sharding.
CPU Bottleneck
SQL issues (joins, GROUP BY, ORDER BY, non‑indexed queries) increase CPU load; solution – SQL optimization, proper indexing, and moving calculations to the service layer.
Large single‑table size leads to full‑table scans and high CPU usage; solution – horizontal table sharding.
Sharding and Partitioning
Horizontal Database Sharding
Concept: split a database into multiple databases based on a field using strategies such as hash or range.
All databases share the same schema.
Data in each database is distinct with no overlap.
The union of all databases equals the full dataset.
Scenario: when absolute concurrency rises but there is no clear business module for vertical sharding.
Analysis: more databases reduce I/O and CPU pressure proportionally.
Horizontal Table Sharding
Concept: split a single table into multiple tables based on a field.
All tables share the same structure.
Data in each table is distinct; the union of all tables equals the full dataset.
Scenario: large single‑table size slows SQL performance and adds CPU load.
Analysis: smaller tables improve query efficiency and reduce CPU usage.
Vertical Database Sharding
Concept: separate tables into different databases according to business domains.
Each database has a different schema.
Data in each database is distinct with no overlap.
The union of all databases equals the full dataset.
Scenario: high absolute concurrency with clear business module separation, enabling service‑oriented architecture.
Vertical Table Sharding
Concept: split a table into a main table and extension tables based on field activity.
Each table has a different structure.
Tables share at least a primary‑key column for association.
The union of all tables equals the full dataset.
Scenario: large rows with many fields cause cache pressure and random‑read I/O.
Analysis: hot fields stay in the main table (cached), reducing random I/O; joins should be performed in the service layer, not with SQL JOIN.
Sharding Tools
sharding-jdbc (Dangdang)
TSharding (Mogujie)
Atlas (Qihoo 360)
Cobar (Alibaba)
MyCAT (based on Cobar)
Oceanus (58.com)
Vitess (Google) – evaluate pros and cons yourself
Problems Introduced by Sharding
Transaction Consistency
Distributed Transactions
Cross‑database updates require distributed transactions, typically handled via XA protocol or two‑phase commit, which increase latency and risk of deadlocks as the number of nodes grows.
Eventual Consistency
For systems tolerant to latency, eventual consistency can be achieved through compensation mechanisms such as reconciliation, log‑based comparison, or periodic synchronization.
Cross‑Node Join Issues
After sharding, data that previously could be joined may reside on different nodes, making joins expensive. Mitigation strategies include:
Global tables (duplicate small reference data across all databases).
Field redundancy (denormalization).
Data assembly in the service layer (multiple queries followed by in‑memory merging).
ER‑sharding (place related tables on the same shard).
Cross‑Node Pagination, Sorting, and Aggregation
Pagination and ordering across shards require sorting results from each shard and merging them, which can be costly for large page numbers; aggregate functions (MAX, MIN, SUM, COUNT) must be computed per shard and then combined.
Global Primary‑Key Collision
Auto‑increment IDs are insufficient across shards; common solutions include:
UUID
Simple but storage‑heavy and index‑unfriendly.
Sequence Table
CREATE TABLE `sequence` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`stub` char(1) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM;Generate a global ID with:
REPLACE INTO sequence (stub) VALUES ('a');
SELECT LAST_INSERT_ID();This approach has a single‑point failure risk.
Snowflake Distributed ID
Twitter's Snowflake generates 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and sequence within a millisecond, supporting up to 1024 nodes and 4096 IDs per node per millisecond.
Data Migration and Scaling
When sharding is introduced, historical data must be migrated according to the new sharding rules, and capacity planning should keep each shard under ~10 million rows.
When to Consider Sharding
Only after exhausting simpler optimizations (hardware upgrades, read/write splitting, index tuning).
When table size hampers backup, DDL, or causes lock contention.
When certain fields become hot and benefit from vertical splitting.
When rapid data growth pushes performance toward bottlenecks.
References
https://www.cnblogs.com/butterfly100/p/9034281.html
https://www.cnblogs.com/littlecharacter/p/9342129.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
