When and How to Apply Database Sharding: Practical Guidelines and Pitfalls
This article explains why database performance bottlenecks arise, outlines criteria for deciding when to shard databases and tables, and provides detailed strategies for SQL tuning, table restructuring, horizontal and vertical splitting, single‑ versus multi‑database sharding, as well as handling cross‑database joins, distributed transactions, pagination, and ID generation.
At the start the article poses three key questions: when to consider sharding, how much data in a single table triggers it, and what data growth rate warrants sharding. It then answers the core reason for sharding – database performance bottlenecks such as request blocking, slow SQL queries, and storage pressure.
Database Performance Bottlenecks
Massive request blocking due to insufficient connections under high concurrency.
SQL slowdown when large tables cause full‑table scans (e.g., tables with billions of rows).
Storage strain as data volume grows.
While upgrading hardware can alleviate these issues, software‑level optimizations are usually more cost‑effective.
Software‑Level Optimization Options
SQL tuning
Table structure optimization
Read‑write separation
Database clustering
Sharding (splitting databases and tables)
Hardware upgrades are also mentioned but are costly and less scalable.
SQL Tuning Steps
Enable slow query logging in MySQL by setting slow_query_log=on, long_query_time=1, and specifying slow_query_log_file=/path/to/log.
Use EXPLAIN to inspect execution plans and ensure queries hit indexes.
Interpret the type column (ALL, index, range, ref, eq_ref, const, system, NULL) and aim for at least range or better.
Table Structure Optimization
Example: adding a frequently needed field (e.g., nickname) to the order table to avoid costly joins, while noting the trade‑off of update complexity for redundant fields.
Architecture Evolution
Three deployment patterns are described:
Single‑application, single‑database : early‑stage monolithic system with one shared database.
Multiple applications, single database : services share a database to minimize changes during rapid iteration.
Multiple applications, multiple databases : as traffic grows, each service gets its own database, necessitating sharding.
Table Sharding Strategies
Two dimensions are covered:
Horizontal vs. vertical splitting:
Vertical: separate rarely used columns into a different table (e.g., user details).
Horizontal: divide rows based on key ranges, parity, or time (daily, monthly, historical tables).
Single‑database vs. multi‑database splitting:
Single‑database: multiple sub‑tables within the same DB, which may hit storage limits.
Multi‑database: distribute sub‑tables across different databases to overcome storage and contention limits.
Complexities Introduced by Sharding
Sharding brings challenges such as cross‑database joins, distributed transactions, pagination, ordering, and ID generation.
Cross‑Database Joins
Field redundancy: duplicate join keys in the primary table.
Data abstraction: ETL to create aggregated tables.
Global tables: replicate small reference tables across databases.
Application‑level assembly: fetch data separately and combine in code.
Distributed Transactions
When multiple databases are involved, local transactions are insufficient; solutions include reliable‑message (MQ) patterns, two‑phase commit, and flexible transaction frameworks.
Sorting, Pagination, and Function Computation
Apply shard‑aware processing: execute functions on each shard, then merge and re‑compute results.
Distributed ID Generation
Auto‑increment IDs cannot be used across shards. Common alternatives listed are UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, and Didi TinyID.
Multiple Data Sources
To retrieve data from many shards, use client‑side or proxy‑layer adapters. Popular middleware includes ShardingSphere (formerly sharding‑jdbc) and Mycat.
Conclusion
Sharding should be considered only after conventional optimizations fail. It adds significant complexity, so avoid premature adoption and design with foresight to balance flexibility, scalability, and maintainability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
