Databases 11 min read

Database Bottlenecks and Sharding: Strategies for Horizontal and Vertical Partitioning

This article examines common database performance bottlenecks such as I/O and CPU limits, explains the concepts, results, and scenarios of horizontal and vertical sharding—including database and table partitioning—and outlines tools, steps, and challenges for implementing effective sharding solutions.

Architect

Oct 27, 2020

Database Bottlenecks and Sharding: Strategies for Horizontal and Vertical Partitioning

1. Database Bottlenecks

Whether the bottleneck is I/O or CPU, both ultimately increase the number of active database connections, approaching or exceeding the maximum connections the database can handle, which can cause service degradation, high concurrency, low throughput, or crashes.

1. I/O Bottleneck

First case: Disk read I/O bottleneck caused by hot data that cannot fit into the cache, leading to massive I/O on each query –> Database sharding and vertical partitioning .

Second case: Network I/O bottleneck when the amount of data requested exceeds bandwidth –> Database sharding .

2. CPU Bottleneck

First case: Inefficient SQL (joins, GROUP BY, ORDER BY, non‑indexed conditions) increases CPU load –> SQL optimization , proper indexing, and moving business calculations to the service layer.

Second case: Large single‑table data volume causing full scans and low SQL efficiency –> Horizontal table partitioning .

2. Sharding Strategies

1. Horizontal Database Sharding

Concept: Use a field as the sharding key and split one database into multiple databases according to a strategy (hash, range, etc.).

Each database has the same schema .

Data in each database is distinct with no overlap.

The union of all databases represents the full dataset.

Scenario: When absolute concurrency spikes and there is no clear business module to justify vertical sharding.

Analysis: Adding more databases reduces I/O and CPU pressure proportionally.

2. Horizontal Table Partitioning

Concept: Use a field as the sharding key and split one table into multiple tables according to a strategy (hash, range, etc.).

Each table shares the same schema .

Data in each table is distinct with no overlap.

The union of all tables represents the full dataset.

Scenario: The system’s absolute concurrency is not high, but a single table’s size hurts SQL efficiency and increases CPU load.

Analysis: Smaller tables improve single‑query performance and reduce CPU usage.

3. Vertical Database Sharding

Concept: Use a table as the sharding unit and split tables belonging to different business domains into separate databases .

Each database has a different schema .

Data in each database is distinct with no overlap.

The union of all databases represents the full dataset.

Scenario: Absolute concurrency rises and distinct business modules can be isolated.

Analysis: Enables service‑oriented architecture; shared configuration or dictionary tables can be moved to dedicated databases for further decoupling.

4. Vertical Table Partitioning

Concept: Use a field to separate hot and cold columns, placing hot columns in a master table and cold columns in an extension table .

Result:

Each table’s schema differs.

Data in each table overlaps on at least one column (usually the primary key) for joining.

The union of all tables equals the full dataset.

Scenario: High absolute concurrency is not the issue; instead, many columns cause large rows, leading to cache misses and random‑read I/O.

Analysis: By keeping hot data in the master table, cache hit rate improves and random I/O decreases; data must be joined at the service layer, avoiding costly joins in the database.

3. Sharding Tools

1. Sharding‑Sphere (jar, formerly Sharding‑JDBC); 2. TDDL (Taobao Distributed Data Layer, jar); 3. Mycat (middleware). Note: Evaluate pros and cons yourself; prioritize official docs and community support.

4. Sharding Implementation Steps

Assess capacity (current and growth) → choose a uniform key → define sharding rule (hash, range, etc.) → execute (usually dual‑write) → handle expansion while minimizing data movement.

5. Sharding Challenges

1. Queries without the partition key

Endpoint only has a non‑partition key condition.

Mapping Method

Gene Method

Note: Use a distributed ID generator (e.g., Snowflake) to create user IDs; the gene bits determine the target shard.

2. Multiple non‑partition keys on the endpoint.

Mapping Method

Redundancy Method

3. Backend queries with various non‑partition key combinations.

NoSQL Method

Redundancy Method

2. Cross‑shard pagination

Use the NoSQL approach (e.g., Elasticsearch) to handle pagination across shards.

3. Scaling (horizontal sharding)

1. Horizontal database scaling (upgrade read‑replica method).

Note: Scaling is exponential.

2. Horizontal table scaling (dual‑write migration).

Steps: (1) Enable dual‑write in the application and deploy; (2) Copy old data to the new database; (3) Verify data consistency; (4) Remove dual‑write and finalize deployment. Dual‑write is a common solution.

6. Sharding Summary

1. Identify the real bottleneck before deciding how to partition; avoid splitting just for its own sake.

2. Choose a sharding key that balances even data distribution and supports non‑partition key queries.

3. Keep sharding rules as simple as possible while meeting requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding databases horizontal-partitioning vertical-partitioning

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.