How to Identify Database Bottlenecks and Choose the Right Sharding Strategy
This article explains common I/O and CPU bottlenecks in databases, compares horizontal and vertical sharding techniques, introduces practical tools, outlines migration steps, and discusses typical challenges such as non‑partition‑key queries, cross‑shard pagination, and scaling.
1. Database Bottlenecks
Both I/O and CPU bottlenecks increase active connections, approaching the maximum threshold and leading to service failures.
1.1 I/O Bottleneck
Disk read I/O: hot data exceeds cache, causing many reads → split database or vertical partitioning.
Network I/O: insufficient bandwidth for request volume → split database.
1.2 CPU Bottleneck
SQL issues (joins, GROUP BY, ORDER BY, non‑indexed queries) increase CPU load → optimize SQL, add proper indexes, move calculations to the service layer.
Large single‑table data causing full scans → horizontal sharding.
2. Sharding Strategies
2.1 Horizontal Database Sharding
Concept: split a database into multiple databases based on a field using hash/range.
All databases share the same schema.
Data in each database is disjoint.
The union of all databases equals the full dataset.
Scenario: high absolute concurrency where table sharding alone cannot solve the problem.
Analysis: more databases reduce I/O and CPU pressure.
2.2 Horizontal Table Sharding
Concept: split a single table into multiple tables based on a field.
Identical schema across tables.
Data in each table is disjoint.
Union of all tables equals the full dataset.
Scenario: single table grows large, hurting SQL efficiency and CPU.
Analysis: smaller tables improve query speed and lower CPU load.
2.3 Vertical Database Sharding
Concept: separate tables with different business domains into different databases.
Each database has a different schema.
Data sets are independent.
Union of all databases equals the full dataset.
Scenario: absolute concurrency rises and distinct business modules emerge.
Analysis: enables service‑oriented architecture.
2.4 Vertical Table Sharding
Concept: split a table’s columns into a main table (hot fields) and an extension table (cold fields) based on column activity.
Table structures differ.
Data sets differ but share a primary‑key column for join.
Union of tables equals the full dataset.
Scenario: many columns, hot and cold data mixed, causing large rows and random‑read I/O.
Analysis: keep hot columns in the main table to improve cache hit rate and reduce I/O; join tables in the service layer and avoid database joins.
3. Sharding Tools
Sharding‑Sphere (formerly Sharding‑JDBC)
TDDL (Taobao Distributed Data Layer)
Mycat (middleware)
Choose tools after evaluating pros and cons.
4. Sharding Process
Assess capacity, decide number of shards, select a uniform key, define sharding rule (hash/range), perform dual‑write migration, handle expansion with minimal data movement.
5. Common Sharding Issues
5.1 Queries without partition key
Methods: mapping, gene (using generated IDs), redundancy, NoSQL fallback, etc.
5.2 Cross‑shard pagination
Typically solved with NoSQL/Elasticsearch.
5.3 Scaling
Horizontal database scaling (upgrade replica) and horizontal table scaling (dual‑write migration) with steps described.
6. Summary
Identify the real bottleneck before deciding between database or table sharding, and between horizontal or vertical approaches.
Key selection must balance even distribution and non‑partition‑key queries.
Simplify sharding rules as much as possible.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
