How to Overcome Database Bottlenecks with Sharding: Strategies and Tools
This article explains common I/O and CPU bottlenecks in databases, compares horizontal and vertical sharding techniques, outlines practical partitioning strategies, introduces popular sharding tools, and provides step‑by‑step guidance for implementing and scaling sharded architectures.
1. Database Bottlenecks
Both I/O and CPU bottlenecks can increase the number of active connections, eventually reaching the database's connection limit and causing service degradation or crashes.
1. I/O Bottleneck
• Disk read I/O overload when hot data exceeds cache capacity, leading to many reads and slower queries → Horizontal sharding and vertical partitioning .
• Network I/O overload when request volume exceeds bandwidth → Sharding .
2. CPU Bottleneck
• Inefficient SQL (joins, GROUP BY, ORDER BY, non‑indexed filters) increases CPU usage → Optimize SQL, add proper indexes, move calculations to the service layer.
• Large single tables cause full‑table scans and high CPU → Horizontal partitioning .
2. Sharding and Partitioning
1. Horizontal Sharding (Database Level)
Concept: Split a single database into multiple databases based on a chosen field using strategies such as hash or range.
Result:
All databases share the same schema.
Data in each database is distinct with no overlap.
The union of all databases represents the full dataset.
Scenario: High overall concurrency where table‑level partitioning cannot solve the problem and no clear business modules exist for vertical sharding.
Analysis: Adding more databases reduces I/O and CPU pressure proportionally.
2. Horizontal Partitioning (Table Level)
Concept: Split a single table into multiple tables based on a field using hash, range, etc.
Result:
All tables share the same schema.
Data in each table is distinct with no overlap.
The union of all tables represents the full dataset.
Scenario: A single table grows large, degrading SQL efficiency and increasing CPU load.
Analysis: Smaller tables improve query speed and lower CPU usage.
3. Vertical Sharding (Database Level)
Concept: Split tables into different databases according to business domains.
Result:
Each database has a different schema.
Data in each database is distinct with no overlap.
The union of all databases represents the full dataset.
Scenario: System concurrency spikes and distinct business modules can be isolated.
Analysis: Enables service‑oriented architecture; shared configuration or dictionary tables can be moved to dedicated databases.
4. Vertical Partitioning (Table Level)
Concept: Split a table's columns into a main table (hot fields) and an extension table (cold fields) based on field activity.
Result:
Structures of the tables differ.
Data sets differ but share at least one common column (usually the primary key) for joining.
The union of both tables equals the full dataset.
Scenario: Large row size due to many columns causes cache pressure and random‑read I/O bottlenecks.
Analysis: Keep hot data in the main table to improve cache hit rate; retrieve full data by joining in the service layer without using database joins.
3. Sharding Tools
Sharding‑Sphere (formerly Sharding‑JDBC) – Java jar.
TDDL – Taobao Distributed Data Layer, Java jar.
Mycat – Middleware solution.
Note: Evaluate the pros and cons of each tool yourself; prioritize official documentation and community support.
4. Sharding Implementation Steps
Assess capacity and growth → Choose a uniform key → Define sharding rule (hash, range, etc.) → Execute (usually with dual‑write) → Plan expansion to minimize data movement.
5. Common Sharding Issues
1. Queries on Non‑Partition Keys
Typical horizontal sharding uses hash routing.
Client‑side only one non‑partition key condition:
Mapping method (illustrated with diagram).
Gene method – generate user_id via Snowflake algorithm; route by modulo of generated id.
Client‑side multiple non‑partition key conditions:
Mapping method (diagram).
Redundancy method – duplicate data to route based on different keys.
Backend queries with various non‑partition key combinations:
NoSQL approach (e.g., Elasticsearch).
Redundancy method.
2. Cross‑Shard Pagination on Non‑Partition Keys
Use the NoSQL method (e.g., Elasticsearch) to handle pagination across shards.
3. Scaling Issues
Horizontal database scaling (adding read replicas) and horizontal table scaling (dual‑write migration) are common solutions; dual‑write is a generic approach.
6. Summary of Sharding and Partitioning
Identify the real bottleneck before deciding how to split data; avoid splitting for its own sake.
Key selection is critical for balanced distribution and for supporting non‑partition key queries.
Simpler sharding rules are preferable as long as they meet requirements.
7. Example Repository
GitHub example: https://github.com/littlecharacter4s/study-sharding
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
