Databases 9 min read

Database Bottlenecks and Sharding: Strategies for Horizontal and Vertical Partitioning

This article explains common database performance bottlenecks such as I/O and CPU limits, describes how horizontal and vertical sharding (both database and table level) can alleviate these issues, outlines practical tools, implementation steps, and addresses typical challenges like non‑partition key queries and scaling.

Java Captain

Oct 2, 2019

Database Bottlenecks and Sharding: Strategies for Horizontal and Vertical Partitioning

1. Database Bottlenecks

Both I/O and CPU bottlenecks increase the number of active connections, eventually reaching the database's connection‑limit threshold, which leads to service degradation or crashes.

1.1 I/O Bottleneck

Disk read I/O: hot data exceeds cache, causing heavy I/O on each query → Database sharding and vertical partitioning .

Network I/O: excessive data transfer exceeds bandwidth → Database sharding .

1.2 CPU Bottleneck

SQL issues (JOIN, GROUP BY, ORDER BY, non‑indexed filters) increase CPU usage → SQL optimization, proper indexing, move calculations to the service layer.

Large single‑table size leads to full‑table scans → Horizontal table partitioning .

2. Sharding & Partitioning

2.1 Horizontal Database Sharding

Based on a chosen field, data from one logical database is split into multiple physical databases using strategies such as hash or range.

All databases share the same schema.

Data in each database is disjoint; the union of all databases equals the full dataset.

Suitable when overall concurrency spikes but business modules are not clearly separable for vertical sharding.

Increasing the number of databases reduces I/O and CPU pressure proportionally.

2.2 Horizontal Table Partitioning

Based on a chosen field, rows of a single logical table are distributed across multiple physical tables.

All tables share the same structure.

Each table holds a distinct, non‑overlapping data slice; the union of all tables equals the full dataset.

Useful when a single table becomes too large, degrading SQL efficiency and increasing CPU load.

Smaller tables improve single‑query performance and lower CPU usage.

2.3 Vertical Database Sharding

Based on business domain, different groups of tables are placed into separate databases.

Each database may have a different schema.

Data sets are disjoint; the union of all databases equals the full dataset.

Applicable when overall concurrency rises and distinct business modules can be isolated.

Often a step toward service‑oriented architecture.

2.4 Vertical Table Partitioning

Based on field activity, hot columns are kept in a primary table while less‑used columns are moved to an extension table.

Primary and extension tables have different structures but share at least one common column (usually the primary key) for joining.

The union of both tables represents the complete dataset.

Reduces row size, improves cache hit rate, and mitigates random read I/O.

Joins should be avoided at the database level; instead, fetch data separately and combine in the service layer.

3. Sharding Tools

Sharding‑Sphere (formerly Sharding‑JDBC) – Java jar.

TDDL – Taobao Distributed Data Layer, Java jar.

Mycat – middleware solution.

Evaluate each tool’s pros and cons yourself; prioritize official documentation and community support.

4. Sharding Implementation Steps

Assess capacity and growth → choose a uniform key → define sharding rule (hash, range, etc.) → implement (usually with double‑write) → plan expansion to minimize data movement.

5. Common Sharding Issues

5.1 Queries without the Partition Key

When only non‑partition keys are used, strategies include mapping methods, gene‑based key generation, redundancy, or NoSQL alternatives (e.g., Elasticsearch) for cross‑shard queries.

5.2 Cross‑Shard Pagination

Typically solved with NoSQL approaches or by aggregating results after individual shard queries.

5.3 Scaling (Horizontal Expansion)

Two main patterns:

Expand databases (upgrade replica set).

Expand tables (double‑write migration): enable double‑write, copy old data, verify consistency, then disable double‑write.

6. Summary

Identify the real bottleneck before deciding on sharding or partitioning.

Select keys that balance even data distribution and support non‑partition‑key queries.

Keep sharding rules as simple as possible while meeting requirements.

7. Example Repository

GitHub: https://github.com/LiHaodong888/SpringBootLear

(End)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.