Databases 10 min read

Database Bottlenecks and Sharding: Strategies, Tools, and Implementation Steps

This article explains common I/O and CPU bottlenecks in databases, introduces horizontal and vertical sharding concepts, compares sharding tools, outlines practical sharding steps, discusses typical sharding issues such as non‑partition queries and expansion, and provides a concise summary and example implementation.

Architect

Aug 16, 2020

Database Bottlenecks and Sharding: Strategies, Tools, and Implementation Steps

1. Database Bottlenecks

Both I/O and CPU bottlenecks increase the number of active connections, eventually reaching the database's capacity limit and causing connection exhaustion for services.

1.1 I/O Bottleneck

Disk read I/O bottleneck: hot data exceeds cache, generating massive I/O and slowing queries → sharding and vertical table partitioning.

Network I/O bottleneck: request volume exceeds bandwidth → sharding.

1.2 CPU Bottleneck

SQL issues such as joins, GROUP BY, ORDER BY, or non‑indexed conditions increase CPU load → SQL optimization, proper indexing, and moving calculations to the service layer.

Large tables cause full scans, low SQL efficiency, and CPU becomes the bottleneck → horizontal sharding.

2. Sharding (Database Partitioning)

2.1 Horizontal Sharding (Database)

Concept: based on a field, split a single database into multiple databases using strategies like hash or range.

Result: each database has identical schema, different data, and the union of all databases equals the full dataset.

Scenario: high concurrent load where sharding alone cannot solve the problem and there is no clear business boundary for vertical sharding.

Analysis: adding more databases reduces I/O and CPU pressure.

2.2 Horizontal Sharding (Table)

Concept: based on a field, split a single table into multiple tables.

Result: each table has identical schema, different data, and the union of all tables equals the full dataset.

Scenario: a single table becomes too large, degrading SQL efficiency and increasing CPU load; see the linked article for SQL optimization principles.

Analysis: smaller tables improve query efficiency and naturally reduce CPU load.

2.3 Vertical Sharding (Database)

Concept: based on business domains, split tables into different databases.

Result: each database has a different schema, different data, and the union of all databases equals the full dataset.

Scenario: absolute concurrency rises and distinct business modules can be isolated.

Analysis: enables service‑orientation, allowing configuration or dictionary tables to be separated into dedicated databases.

2.4 Vertical Sharding (Table)

Concept: based on field activity, split a table into a main table and extension tables.

Result: each table has a different schema, at least one common column (usually the primary key) for joining, and the union of all tables equals the full dataset.

Scenario: many fields with mixed hot and cold data cause large rows, reducing cache effectiveness and generating random read I/O.

Analysis: keep hot data in the main table to improve cache hit rate and reduce random I/O; perform joins in the service layer and avoid database joins.

3. Sharding Tools

sharding‑sphere (formerly sharding‑jdbc)

TDDL (Taobao Distributed Data Layer)

Mycat (middleware)

Note: evaluate the advantages and disadvantages of each tool yourself; prioritize official documentation and community support.

4. Sharding Steps

Assess current capacity and growth to determine the number of shards → choose a uniform key → define sharding rule (hash, range, etc.) → execute (usually double‑write) → handle expansion while minimizing data movement.

Extension: see the linked MySQL article on the differences between sharding and partitioning.

5. Sharding Issues

5.1 Queries without Partition Key

When only a non‑partition key is used, mapping or gene methods can be applied; avoid database joins and perform joins in the service layer.

Examples include mapping method, gene method, redundancy method, and NoSQL approach (illustrated with images in the original article).

5.2 Cross‑Shard Pagination

Use NoSQL solutions such as Elasticsearch to handle pagination across shards.

5.3 Expansion

Horizontal database expansion via replica upgrade (illustrated with an image).

Horizontal table expansion via double‑write migration:

Step 1: Enable double‑write in application configuration and code, then deploy.

Step 2: Copy old data from the old database to the new one.

Step 3: Verify new database data against the old database.

Step 4: Remove double‑write from configuration and code, then deploy.

Note: double‑write is a generic solution for migration.

6. Summary

Identify the real bottleneck before deciding how to shard; unnecessary sharding should be avoided.

Key selection is critical for even distribution and for handling non‑partition queries.

Simpler sharding rules are preferable as long as they meet the requirements.

7. Example

GitHub example repository: https://github.com/littlecharacter4s/study-sharding

·END·

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance scaling Partitioning

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.