Databases 10 min read

Database Bottlenecks and Sharding Strategies (Horizontal & Vertical Partitioning)

The article explains common database performance bottlenecks such as I/O and CPU limits, and details horizontal and vertical sharding techniques—including database and table partitioning—along with tools, implementation steps, common issues, scaling strategies, and practical examples for improving scalability and reliability.

Architecture Digest

Apr 26, 2019

Database Bottlenecks and Sharding Strategies (Horizontal & Vertical Partitioning)

1. Database Bottlenecks

Both I/O and CPU bottlenecks increase the number of active database connections, eventually reaching the maximum threshold the database can handle, which leads to connection shortages, reduced concurrency, lower throughput, and possible crashes.

1.1 I/O Bottleneck

Case 1: Disk read I/O bottleneck caused by hot data that cannot fit into cache, resulting in heavy I/O on each query. Solution: Horizontal sharding (database partitioning) and vertical table partitioning.

Case 2: Network I/O bottleneck when the amount of requested data exceeds available bandwidth. Solution: Database sharding.

1.2 CPU Bottleneck

Case 1: Inefficient SQL (joins, GROUP BY, ORDER BY, non‑indexed conditions) increases CPU load. Solution: Optimize SQL, create appropriate indexes, and move business calculations to the service layer.

Case 2: Large single‑table size leads to full‑table scans and high CPU usage. Solution: Horizontal table partitioning.

2. Sharding (Database & Table Partitioning)

2.1 Horizontal Database Sharding

Concept: Split data of one database into multiple databases based on a chosen field using strategies such as hash or range.

All databases share the same schema.

Data in each database is disjoint.

The union of all databases equals the full dataset.

Scenario: System concurrency spikes and vertical sharding is not feasible.

Analysis: More databases reduce I/O and CPU pressure proportionally.

2.2 Horizontal Table Partitioning

Concept: Split a single table into multiple tables based on a field using hash, range, etc.

All tables share the same structure.

Data in each table is disjoint, with at least one common column (usually the primary key) for joins.

The union of all tables equals the full dataset.

Scenario: Single‑table size is large, causing low SQL efficiency and high CPU load.

Analysis: Smaller tables improve query performance and reduce CPU usage.

2.3 Vertical Database Sharding

Concept: Split tables into different databases according to business domains.

Each database has a different schema.

Data in each database is distinct with no overlap.

The union of all databases equals the full dataset.

Scenario: High concurrency with clear business module boundaries.

Analysis: Enables service‑oriented architecture; shared configuration or dictionary tables can be moved to separate databases.

2.4 Vertical Table Partitioning

Concept: Separate hot fields into a main table and less‑used fields into an extension table based on field activity.

Tables have different structures.

Data sets differ but share at least one column (usually the primary key) for association.

The union of both tables equals the full dataset.

Scenario: Large row size due to many fields causes cache pressure and random read I/O.

Analysis: Keep hot data in the main table for caching; join tables in the service layer rather than using database joins.

3. Sharding Tools

Sharding‑Sphere (formerly Sharding‑JDBC) – Java jar.

TDDL – Taobao Distributed Data Layer, Java jar.

Mycat – middleware solution.

Evaluate pros and cons of each tool yourself; prioritize official documentation and community support.

4. Sharding Implementation Steps

Assess capacity and growth → choose a uniform key → define sharding rule (hash/range) → execute (usually double‑write) → handle scaling while minimizing data movement.

5. Sharding Issues

5.1 Queries Without Partition Key

Various strategies such as mapping method, gene method, redundancy method, and NoSQL approaches (e.g., Elasticsearch) are discussed, with diagrams illustrating each.

5.2 Cross‑Table Pagination for Non‑Partition Keys

Recommendation: use NoSQL solutions like Elasticsearch for efficient pagination.

5.3 Scaling (Expansion) Problems

Horizontal scaling of databases (upgrade replica method) and tables (double‑write migration) are illustrated with step‑by‑step procedures; double‑write is a common technique.

6. Summary

Identify the true bottleneck before deciding on sharding strategy (horizontal vs. vertical, database vs. table, number of shards).

Select keys that balance data distribution and support non‑partition key queries.

Simplicity in sharding rules is preferable as long as requirements are met.

7. Example

GitHub repository with a sample implementation: https://github.com/littlecharacter4s/study-sharding

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance scalability Database Sharding Horizontal Partitioning Vertical Partitioning

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.