Database Bottlenecks and Sharding: Strategies for Horizontal and Vertical Partitioning
This article explains common database performance bottlenecks such as I/O and CPU limits, describes how horizontal and vertical sharding (both database and table level) can alleviate these issues, outlines practical tools, implementation steps, and addresses typical challenges like non‑partition key queries and scaling.
1. Database Bottlenecks
Both I/O and CPU bottlenecks increase the number of active connections, eventually reaching the database's connection‑limit threshold, which leads to service degradation or crashes.
1.1 I/O Bottleneck
Disk read I/O: hot data exceeds cache, causing heavy I/O on each query → Database sharding and vertical partitioning .
Network I/O: excessive data transfer exceeds bandwidth → Database sharding .
1.2 CPU Bottleneck
SQL issues (JOIN, GROUP BY, ORDER BY, non‑indexed filters) increase CPU usage → SQL optimization, proper indexing, move calculations to the service layer.
Large single‑table size leads to full‑table scans → Horizontal table partitioning .
2. Sharding & Partitioning
2.1 Horizontal Database Sharding
Based on a chosen field, data from one logical database is split into multiple physical databases using strategies such as hash or range.
All databases share the same schema.
Data in each database is disjoint; the union of all databases equals the full dataset.
Suitable when overall concurrency spikes but business modules are not clearly separable for vertical sharding.
Increasing the number of databases reduces I/O and CPU pressure proportionally.
2.2 Horizontal Table Partitioning
Based on a chosen field, rows of a single logical table are distributed across multiple physical tables.
All tables share the same structure.
Each table holds a distinct, non‑overlapping data slice; the union of all tables equals the full dataset.
Useful when a single table becomes too large, degrading SQL efficiency and increasing CPU load.
Smaller tables improve single‑query performance and lower CPU usage.
2.3 Vertical Database Sharding
Based on business domain, different groups of tables are placed into separate databases.
Each database may have a different schema.
Data sets are disjoint; the union of all databases equals the full dataset.
Applicable when overall concurrency rises and distinct business modules can be isolated.
Often a step toward service‑oriented architecture.
2.4 Vertical Table Partitioning
Based on field activity, hot columns are kept in a primary table while less‑used columns are moved to an extension table.
Primary and extension tables have different structures but share at least one common column (usually the primary key) for joining.
The union of both tables represents the complete dataset.
Reduces row size, improves cache hit rate, and mitigates random read I/O.
Joins should be avoided at the database level; instead, fetch data separately and combine in the service layer.
3. Sharding Tools
Sharding‑Sphere (formerly Sharding‑JDBC) – Java jar.
TDDL – Taobao Distributed Data Layer, Java jar.
Mycat – middleware solution.
Evaluate each tool’s pros and cons yourself; prioritize official documentation and community support.
4. Sharding Implementation Steps
Assess capacity and growth → choose a uniform key → define sharding rule (hash, range, etc.) → implement (usually with double‑write) → plan expansion to minimize data movement.
5. Common Sharding Issues
5.1 Queries without the Partition Key
When only non‑partition keys are used, strategies include mapping methods, gene‑based key generation, redundancy, or NoSQL alternatives (e.g., Elasticsearch) for cross‑shard queries.
5.2 Cross‑Shard Pagination
Typically solved with NoSQL approaches or by aggregating results after individual shard queries.
5.3 Scaling (Horizontal Expansion)
Two main patterns:
Expand databases (upgrade replica set).
Expand tables (double‑write migration): enable double‑write, copy old data, verify consistency, then disable double‑write.
6. Summary
Identify the real bottleneck before deciding on sharding or partitioning.
Select keys that balance even data distribution and support non‑partition‑key queries.
Keep sharding rules as simple as possible while meeting requirements.
7. Example Repository
GitHub: https://github.com/LiHaodong888/SpringBootLear
(End)
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.