Databases 12 min read

When and How to Apply Database Sharding: Practical Guidelines and Pitfalls

This article explains why database performance bottlenecks arise, outlines criteria for deciding when to shard databases and tables, and provides detailed strategies for SQL tuning, table restructuring, horizontal and vertical splitting, single‑ versus multi‑database sharding, as well as handling cross‑database joins, distributed transactions, pagination, and ID generation.

dbaplus Community

Apr 5, 2021

When and How to Apply Database Sharding: Practical Guidelines and Pitfalls

At the start the article poses three key questions: when to consider sharding, how much data in a single table triggers it, and what data growth rate warrants sharding. It then answers the core reason for sharding – database performance bottlenecks such as request blocking, slow SQL queries, and storage pressure.

Database Performance Bottlenecks

Massive request blocking due to insufficient connections under high concurrency.

SQL slowdown when large tables cause full‑table scans (e.g., tables with billions of rows).

Storage strain as data volume grows.

While upgrading hardware can alleviate these issues, software‑level optimizations are usually more cost‑effective.

Software‑Level Optimization Options

SQL tuning

Table structure optimization

Read‑write separation

Database clustering

Sharding (splitting databases and tables)

Hardware upgrades are also mentioned but are costly and less scalable.

SQL Tuning Steps

Enable slow query logging in MySQL by setting slow_query_log=on, long_query_time=1, and specifying slow_query_log_file=/path/to/log.

Use EXPLAIN to inspect execution plans and ensure queries hit indexes.

Interpret the type column (ALL, index, range, ref, eq_ref, const, system, NULL) and aim for at least range or better.

Table Structure Optimization

Example: adding a frequently needed field (e.g., nickname) to the order table to avoid costly joins, while noting the trade‑off of update complexity for redundant fields.

Architecture Evolution

Three deployment patterns are described:

Single‑application, single‑database : early‑stage monolithic system with one shared database.

Multiple applications, single database : services share a database to minimize changes during rapid iteration.

Multiple applications, multiple databases : as traffic grows, each service gets its own database, necessitating sharding.

Table Sharding Strategies

Two dimensions are covered:

Horizontal vs. vertical splitting:

Vertical: separate rarely used columns into a different table (e.g., user details).

Horizontal: divide rows based on key ranges, parity, or time (daily, monthly, historical tables).

Single‑database vs. multi‑database splitting:

Single‑database: multiple sub‑tables within the same DB, which may hit storage limits.

Multi‑database: distribute sub‑tables across different databases to overcome storage and contention limits.

Complexities Introduced by Sharding

Sharding brings challenges such as cross‑database joins, distributed transactions, pagination, ordering, and ID generation.

Cross‑Database Joins

Field redundancy: duplicate join keys in the primary table.

Data abstraction: ETL to create aggregated tables.

Global tables: replicate small reference tables across databases.

Application‑level assembly: fetch data separately and combine in code.

Distributed Transactions

When multiple databases are involved, local transactions are insufficient; solutions include reliable‑message (MQ) patterns, two‑phase commit, and flexible transaction frameworks.

Sorting, Pagination, and Function Computation

Apply shard‑aware processing: execute functions on each shard, then merge and re‑compute results.

Distributed ID Generation

Auto‑increment IDs cannot be used across shards. Common alternatives listed are UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, and Didi TinyID.

Multiple Data Sources

To retrieve data from many shards, use client‑side or proxy‑layer adapters. Popular middleware includes ShardingSphere (formerly sharding‑jdbc) and Mycat.

Conclusion

Sharding should be considered only after conventional optimizations fail. It adds significant complexity, so avoid premature adoption and design with foresight to balance flexibility, scalability, and maintainability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization database sharding distributed transactions Horizontal Partitioning Vertical Partitioning ID Generation SQL tuning

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.