Databases 11 min read

Database Bottlenecks and Sharding Strategies: Horizontal/Vertical Partitioning, Tools, and Best Practices

This article analyzes common database I/O and CPU bottlenecks, explains horizontal and vertical sharding concepts, presents practical sharding rules, discusses tooling such as Sharding‑Sphere, TDDL and Mycat, and outlines step‑by‑step procedures and common pitfalls for scaling relational databases.

Java Captain
Java Captain
Java Captain
Database Bottlenecks and Sharding Strategies: Horizontal/Vertical Partitioning, Tools, and Best Practices

1. Database Bottlenecks

Both I/O and CPU bottlenecks increase the number of active connections, eventually reaching the database's maximum connection threshold, which leads to insufficient or no available connections for business services, causing reduced concurrency, throughput, and possible crashes.

1.1 I/O Bottleneck

Disk read I/O: Hot data exceeds cache capacity, causing massive I/O on each query, slowing down performance → solution: sharding and vertical partitioning.

Network I/O: Excessive data transfer overwhelms bandwidth → solution: sharding.

1.2 CPU Bottleneck

SQL issues such as joins, GROUP BY, ORDER BY, or non‑indexed column conditions increase CPU load → solution: SQL optimization, proper indexing, and moving business calculations to the service layer.

Large single‑table data leads to full‑table scans, causing CPU strain → solution: horizontal sharding.

2. Sharding Strategies

2.1 Horizontal Database Sharding

Concept: Split data from one database into multiple databases based on a field using strategies such as hash or range.

Result:

All databases share the same schema.

Data in each database is disjoint; the union of all databases equals the full dataset.

Scenario: System concurrency spikes and horizontal sharding alone cannot solve the problem, with no clear business module for vertical sharding.

Analysis: Adding more databases reduces I/O and CPU pressure proportionally.

2.2 Horizontal Table Sharding

Concept: Split a single table into multiple tables based on a field using hash, range, etc.

Result:

All tables share the same schema.

Data in each table is disjoint; the union of all tables equals the full dataset.

Scenario: Concurrency is not high, but a single table holds too much data, degrading SQL efficiency and increasing CPU load.

Analysis: Smaller tables improve single‑query efficiency and reduce CPU burden.

2.3 Vertical Database Sharding

Concept: Split tables into different databases according to business domains.

Result:

Each database may have a different schema.

Data in each database is disjoint; the union of all databases equals the full dataset.

Scenario: System concurrency rises and distinct business modules can be isolated.

Analysis: Enables service‑oriented architecture; shared configuration or dictionary tables can be moved to separate databases for independent scaling.

2.4 Vertical Table Sharding

Concept: Split a table into a main table and extension tables based on field activity (hot vs. cold fields).

Result:

Tables have different schemas.

Each table contains a distinct subset of columns; they share at least a primary‑key column for joining.

The union of all tables equals the full dataset.

Scenario: Concurrency is low, but a table has many columns with hot and cold data mixed, causing large rows, cache pressure, and random‑read I/O.

Analysis: Keep hot columns in the main table to improve cache hit rate; retrieve cold columns from extension tables when needed, avoiding joins at the database level.

3. Sharding Tools

sharding‑sphere: a JAR (formerly sharding‑jdbc).

TDDL: Taobao Distributed Data Layer, a JAR.

Mycat: middleware.

Note: Evaluate the pros and cons of each tool yourself; prioritize official documentation and community support.

4. Sharding Implementation Steps

Assess capacity (current and growth) → select a uniform key → define sharding rule (hash, range, etc.) → execute (usually dual‑write) → handle expansion while minimizing data movement.

5. Common Sharding Issues

5.1 Queries Without Partition Key

When using horizontal sharding with a hash strategy, queries that only contain non‑partition keys require mapping or gene methods, often leading to additional joins or data duplication.

Mapping method example:

Gene method example (using bits of user_id for routing):

5.2 Cross‑Shard Pagination

Solution often involves a NoSQL store (e.g., Elasticsearch) to aggregate results across shards.

5.3 Scaling (Expansion)

Horizontal database expansion via replica upgrade:

Horizontal table expansion using dual‑write migration:

Step 1: Enable dual‑write in application configuration and deploy.

Step 2: Copy existing data from old to new table.

Step 3: Verify data consistency.

Step 4: Remove dual‑write configuration and deploy.

Note: Dual‑write is a generic approach for safe migration.

6. Summary

Identify the real bottleneck before deciding how to shard (database vs. table, horizontal vs. vertical).

Choosing the right sharding key is crucial for both even data distribution and handling non‑partition key queries.

Simpler sharding rules are preferable as long as they meet business requirements.

7. Example Repository

GitHub example: https://github.com/littlecharacter4s/study-sharding
performance optimizationscalabilityShardingDatabaseshorizontal partitioningvertical partitioning
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.