Database Bottlenecks and Sharding: Strategies for Partitioning, Scaling, and Tooling
This article explains how IO and CPU bottlenecks increase active database connections, describes horizontal and vertical sharding techniques, outlines practical steps and tools for implementing sharding, and discusses common challenges such as non‑partition key queries and expansion strategies.
Database performance issues, whether caused by I/O or CPU bottlenecks, ultimately increase the number of active connections and can exhaust the database's capacity, leading to service failures, reduced throughput, and possible crashes.
IO bottlenecks : (1) Disk read I/O becomes a problem when hot data cannot fit into cache, causing many reads and slowing queries – the solution is database‑level sharding or vertical partitioning. (2) Network I/O issues arise when request volume exceeds bandwidth – also mitigated by sharding.
CPU bottlenecks : (1) Inefficient SQL (joins, GROUP BY, ORDER BY, non‑indexed conditions) increases CPU load – resolve with SQL optimization, proper indexing, or moving calculations to the service layer. (2) Large tables cause full scans, leading to CPU pressure – address with horizontal table sharding.
Sharding approaches :
Horizontal sharding (database level)
Data is split across multiple databases based on a chosen key using strategies such as hash or range. Each database has identical schema, disjoint data, and together they represent the full dataset. This reduces I/O and CPU pressure when concurrency rises.
Horizontal sharding (table level)
Data is split across multiple tables using the same key‑based strategies. Each table shares the same structure, holds distinct rows, and the union of all tables equals the full dataset, improving query efficiency and reducing CPU load.
Vertical sharding (database level)
Different business modules are placed in separate databases. Schemas differ, data is disjoint, and the combined data set is complete. This enables service‑oriented architectures and isolates high‑traffic modules.
Vertical sharding (table level)
Columns are split into a main table (hot fields) and extension tables (cold fields). Structures differ, but tables share a primary key for joins. This reduces row size, improves cache hit rates, and mitigates random‑read I/O.
Sharding tools : Sharding‑Sphere (jar, formerly sharding‑jdbc), TDDL (Taobao Distributed Data Layer), Mycat (middleware). Evaluate pros and cons before adoption.
Sharding implementation steps : assess capacity and growth, select a uniformly distributed key, define sharding rule (hash/range), execute (typically with dual‑write), and plan for expansion to minimize data movement.
Common challenges :
Non‑partition‑key queries – use mapping, gene, redundancy, or NoSQL approaches; avoid joins that increase CPU load.
Cross‑shard pagination – often solved with external search engines like Elasticsearch.
Scaling (horizontal expansion) – upgrade replicas or use dual‑write migration with synchronized data copy and verification.
Conclusion : Identify the true bottleneck before deciding on sharding strategy; choose keys that balance uniform distribution and query requirements; keep sharding rules simple and only split when it truly solves a performance problem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
