Databases 10 min read

Database Bottlenecks and Sharding Strategies

The article explains how I/O and CPU bottlenecks increase active database connections, then details various sharding techniques—including horizontal and vertical database/table partitioning, their concepts, results, scenarios, and analysis—followed by tools, implementation steps, common issues, and best‑practice recommendations.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Database Bottlenecks and Sharding Strategies

Database Bottlenecks

Both I/O and CPU bottlenecks can cause the number of active database connections to rise, eventually reaching the maximum capacity and leading to service degradation, reduced concurrency, and possible crashes.

1. I/O Bottleneck

• Disk read I/O bottleneck: hot data exceeds cache, causing massive I/O per query –> solution: database sharding and vertical table partitioning.

• Network I/O bottleneck: request volume exceeds bandwidth –> solution: sharding.

2. CPU Bottleneck

First type: SQL statements with joins, GROUP BY, ORDER BY, or non‑indexed conditions increase CPU load –> solution: SQL optimization, proper indexing, and moving calculations to the service layer.

Second type: Very large tables cause full scans and low SQL efficiency –> solution: horizontal table partitioning.

Sharding and Partitioning

1. Horizontal Sharding (Database)

Concept: Split a single database into multiple databases based on a key using strategies such as hash or range.

Result:

Each database has the same schema.

Data in each database is disjoint.

The union of all databases equals the full dataset.

Scenario: System concurrency spikes and vertical sharding is not applicable.

Analysis: More databases reduce I/O and CPU pressure proportionally.

2. Horizontal Partitioning (Table)

Concept: Split a single table into multiple tables based on a key using hash, range, etc.

Result:

Each table shares the same structure.

Data in each table is disjoint.

The union of all tables equals the full dataset.

Scenario: High data volume in a single table degrades SQL efficiency and increases CPU load.

Analysis: Smaller tables improve query speed and lower CPU usage.

3. Vertical Sharding (Database)

Concept: Split tables into different databases according to business domains.

Result:

Each database may have a different schema.

Data is isolated per business module.

The union of all databases equals the full dataset.

Scenario: System concurrency rises and distinct business modules can be isolated.

Analysis: Enables service‑oriented architecture and easier scaling of individual modules.

4. Vertical Partitioning (Table)

Concept: Split a table into a main table and one or more extension tables based on column activity.

Result:

Tables have different structures.

Each table contains a distinct set of columns; they share a primary key for joining.

The union of all tables equals the full dataset.

Scenario: Table has many columns, with hot and cold data mixed, causing large rows and random‑read I/O.

Analysis: Hot columns are placed in the main table to stay in cache, reducing random I/O; queries must join main and extension tables in the service layer, avoiding costly database joins.

Sharding Tools

sharding‑sphere : Java JAR, successor of sharding‑jdbc.

TDDL : Java JAR, Taobao Distributed Data Layer.

Mycat : Middleware solution.

Note: Evaluate the pros and cons of each tool yourself; prioritize official documentation and community support.

Sharding Implementation Steps

Assess capacity and growth → choose a uniform key → define sharding rule (hash, range, etc.) → execute (usually with dual‑write) → handle scaling while minimizing data movement.

Common Sharding Issues

1. Queries without the partition key

When only a non‑partition key is used, mapping or gene‑based methods can route queries; see illustrated examples.

2. Cross‑shard pagination

Use NoSQL solutions such as Elasticsearch to handle pagination across shards.

3. Scaling

Horizontal scaling of databases (upgrade‑slave method) and tables (dual‑write migration) involve synchronizing writes, copying data, verifying consistency, and finally removing dual‑write.

Note: Dual‑write is a generic approach for scaling.

Sharding Summary

Identify the real bottleneck before deciding how to shard.

Choosing the right sharding key is crucial for even distribution and query performance.

Simplicity in sharding rules leads to easier maintenance.

Final Note

For more articles, PDFs, and community discussion, follow the "Code Monkey Technical Column" public account.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceScalabilitysharding
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.