Databases 10 min read

How to Diagnose and Fix Database IO/CPU Bottlenecks with Sharding

The article explains how IO and CPU bottlenecks increase active database connections, then details horizontal and vertical sharding techniques, practical tools, step‑by‑step implementation, and common pitfalls to help engineers relieve pressure on database resources.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How to Diagnose and Fix Database IO/CPU Bottlenecks with Sharding

Database Bottlenecks

Both I/O and CPU bottlenecks increase active connections, eventually reaching the maximum allowed connections and causing reduced throughput or crashes.

I/O Bottleneck

Disk read I/O : Hot data exceeds buffer cache, causing many reads and slowing queries. Mitigation : Apply sharding (horizontal database) or vertical partitioning to reduce per‑node data size.

Network I/O : Large result sets saturate bandwidth. Mitigation : Use sharding to distribute traffic across multiple instances.

CPU Bottleneck

SQL inefficiencies : Joins, GROUP BY, ORDER BY, or predicates on non‑indexed columns increase CPU usage. Mitigation : Optimize queries, add appropriate indexes, and move heavy calculations to the application layer.

Large single‑table scans : Scanning millions of rows consumes CPU. Mitigation : Apply horizontal partitioning (sharding tables).

Sharding Concepts (分库分表)

Sharding splits data across multiple databases or tables based on a chosen key (hash, range, etc.) to distribute load.

Horizontal Sharding (水平分库)

Concept : Distribute whole databases according to a sharding key.

Result : Each database has identical schema but disjoint data; the union of all databases equals the full dataset.

Typical scenario : Sudden concurrency increase without clear business boundaries for vertical splitting.

Effect : I/O and CPU pressure are reduced proportionally to the number of shards.

Horizontal Partitioning (水平分表)

Concept : Split a single large table into multiple tables using the same sharding key.

Result : Identical table structures, disjoint rows; combined tables represent the whole data.

Typical scenario : Table size grows large, degrading query performance and increasing CPU load.

Effect : Smaller tables improve query latency and lower CPU consumption.

Vertical Sharding (垂直分库)

Concept : Separate groups of tables into different databases according to business domains.

Result : Databases have different schemas and disjoint data; together they hold the complete dataset.

Typical scenario : High concurrency with distinct business modules that can be isolated.

Effect : Enables service‑oriented architecture; shared configuration or dictionary tables can be moved to dedicated databases.

Vertical Partitioning (垂直分表)

Concept : Divide a wide table into a primary (hot) table and one or more extension tables based on column activity.

Result : Different table structures share a common primary key; rows are split between hot and cold tables.

Typical scenario : A table has many columns, mixing hot and cold fields, causing large rows that reduce cache effectiveness and generate random‑read I/O.

Effect : Hot fields stay together, improving cache hit rate; full rows are reconstructed by joining tables in the application layer, avoiding database‑side joins.

Sharding Tools

Sharding‑Sphere (formerly Sharding‑JDBC) – Java JAR library.

TDDL – Taobao Distributed Data Layer, Java JAR.

Mycat – Middleware proxy solution.

Choose a tool based on documentation quality, community activity, and compatibility with the target DBMS.

Implementation Steps

Assess current data volume and growth rate to determine the required number of shards.

Select a uniformly distributed sharding key (e.g., user_id, order_id) that minimizes hotspot risk.

Define the sharding rule (hash, range, or composite) and document the mapping algorithm.

Perform migration using a dual‑write pattern: the application writes to both old and new shards while synchronizing existing data.

Plan future expansion (adding shards) to minimize data movement; use consistent hashing or re‑sharding scripts.

Common Sharding Challenges

Non‑partition‑key queries : Queries that filter on columns other than the sharding key require additional routing logic or data duplication.

Mapping method (example image):

Gene method : Encode a user_id (e.g., 3‑bit gene for 8 tables) and route by modulo. For user_name queries, generate a user_name_code first, then apply modulo routing.

Identifiers are often generated with the Snowflake algorithm.

Redundancy method : Duplicate data across databases (e.g., route order_id or buyer_id to a buyer database, seller_id to a seller database).

Cross‑Shard Queries

When backend services need to query by multiple non‑partition keys, a typical solution is to offload the aggregation to an external NoSQL store such as Elasticsearch.

Scaling and Expansion

Horizontal Database Expansion

Adding replica shards typically doubles capacity.

Horizontal Table Expansion (Dual‑Write Migration)

Enable dual‑write in the application configuration and deploy.

Copy existing rows from the old table to the new sharded tables.

Validate data consistency between old and new tables (row counts, checksum).

Remove dual‑write configuration and redeploy.

Sharding Summary

Identify the primary bottleneck (I/O vs. CPU) before deciding between database‑level or table‑level sharding, and between horizontal vs. vertical splitting.

Choose a partition key that distributes data evenly and minimizes the need for non‑partition‑key queries.

Keep sharding rules simple to reduce operational complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancedatabaseshardingCPU BottleneckHorizontal PartitionVertical PartitionIO Bottleneck
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.