Database Bottlenecks and Sharding Strategies
The article explains how I/O and CPU bottlenecks increase active database connections, then details various sharding techniques—including horizontal and vertical database/table partitioning, their concepts, results, scenarios, and analysis—followed by tools, implementation steps, common issues, and best‑practice recommendations.
Database Bottlenecks
Both I/O and CPU bottlenecks can cause the number of active database connections to rise, eventually reaching the maximum capacity and leading to service degradation, reduced concurrency, and possible crashes.
1. I/O Bottleneck
• Disk read I/O bottleneck: hot data exceeds cache, causing massive I/O per query –> solution: database sharding and vertical table partitioning.
• Network I/O bottleneck: request volume exceeds bandwidth –> solution: sharding.
2. CPU Bottleneck
First type: SQL statements with joins, GROUP BY, ORDER BY, or non‑indexed conditions increase CPU load –> solution: SQL optimization, proper indexing, and moving calculations to the service layer.
Second type: Very large tables cause full scans and low SQL efficiency –> solution: horizontal table partitioning.
Sharding and Partitioning
1. Horizontal Sharding (Database)
Concept: Split a single database into multiple databases based on a key using strategies such as hash or range.
Result:
Each database has the same schema.
Data in each database is disjoint.
The union of all databases equals the full dataset.
Scenario: System concurrency spikes and vertical sharding is not applicable.
Analysis: More databases reduce I/O and CPU pressure proportionally.
2. Horizontal Partitioning (Table)
Concept: Split a single table into multiple tables based on a key using hash, range, etc.
Result:
Each table shares the same structure.
Data in each table is disjoint.
The union of all tables equals the full dataset.
Scenario: High data volume in a single table degrades SQL efficiency and increases CPU load.
Analysis: Smaller tables improve query speed and lower CPU usage.
3. Vertical Sharding (Database)
Concept: Split tables into different databases according to business domains.
Result:
Each database may have a different schema.
Data is isolated per business module.
The union of all databases equals the full dataset.
Scenario: System concurrency rises and distinct business modules can be isolated.
Analysis: Enables service‑oriented architecture and easier scaling of individual modules.
4. Vertical Partitioning (Table)
Concept: Split a table into a main table and one or more extension tables based on column activity.
Result:
Tables have different structures.
Each table contains a distinct set of columns; they share a primary key for joining.
The union of all tables equals the full dataset.
Scenario: Table has many columns, with hot and cold data mixed, causing large rows and random‑read I/O.
Analysis: Hot columns are placed in the main table to stay in cache, reducing random I/O; queries must join main and extension tables in the service layer, avoiding costly database joins.
Sharding Tools
sharding‑sphere : Java JAR, successor of sharding‑jdbc.
TDDL : Java JAR, Taobao Distributed Data Layer.
Mycat : Middleware solution.
Note: Evaluate the pros and cons of each tool yourself; prioritize official documentation and community support.
Sharding Implementation Steps
Assess capacity and growth → choose a uniform key → define sharding rule (hash, range, etc.) → execute (usually with dual‑write) → handle scaling while minimizing data movement.
Common Sharding Issues
1. Queries without the partition key
When only a non‑partition key is used, mapping or gene‑based methods can route queries; see illustrated examples.
2. Cross‑shard pagination
Use NoSQL solutions such as Elasticsearch to handle pagination across shards.
3. Scaling
Horizontal scaling of databases (upgrade‑slave method) and tables (dual‑write migration) involve synchronizing writes, copying data, verifying consistency, and finally removing dual‑write.
Note: Dual‑write is a generic approach for scaling.
Sharding Summary
Identify the real bottleneck before deciding how to shard.
Choosing the right sharding key is crucial for even distribution and query performance.
Simplicity in sharding rules leads to easier maintenance.
Final Note
For more articles, PDFs, and community discussion, follow the "Code Monkey Technical Column" public account.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
