When and How to Shard Databases: A Practical Guide to Splitting Tables and Schemas
This article explains why database sharding is needed, how to identify performance bottlenecks, and provides step‑by‑step guidance on SQL tuning, table redesign, architectural changes, and practical horizontal and vertical sharding techniques with real‑world e‑commerce examples.
Why Shard Databases and Tables?
When a database reaches a performance bottleneck—manifested as request blocking, slow SQL queries, or storage pressure—simply upgrading hardware becomes costly and inefficient, so software‑level solutions like sharding are preferred.
Database Optimization Options
Optimizations fall into two categories:
Software: SQL tuning, table structure redesign, read/write splitting, clustering, sharding.
Hardware: Adding CPU, memory, disk, or network resources.
SQL Tuning
Enable slow‑query logging in MySQL:
slow_query_log=on
long_query_time=1
slow_query_log_file=/path/to/logUse EXPLAIN to view execution plans. Example:
select id, age, gender from user where name = '爱笑的架构师';The type column values range from ALL (full table scan) to system (best). Aim for range or better.
Table Structure Optimization
Redundant fields can reduce join overhead. For example, adding nickname to the order table avoids joining the user table when displaying order lists. Choose fields that change infrequently.
Architectural Optimization
When a single instance cannot handle load, consider:
Read/write splitting: primary handles writes, replicas handle reads.
Caching (e.g., Redis) to offload read traffic.
Database clustering for horizontal scaling.
If caching still leaves the database as a bottleneck, move to sharding.
Hardware Optimization
Upgrading hardware yields diminishing returns as traffic grows; early gains are limited compared to software solutions.
Sharding Explained with an E‑Commerce Example
Single Application, Single Database
Early‑stage monolithic apps use one database for all modules (portal, user, order, inventory). This works while user volume is low.
Multiple Applications, Single Database
As features grow, services are split (portal, user, order, inventory) but still share one database to minimize impact.
Multiple Applications, Multiple Databases
When the shared database becomes a bottleneck, each service gets its own database—this is the "split‑database" step.
When to Split Tables
If a table grows rapidly (e.g., order table exceeding millions of rows), query performance degrades. A common rule of thumb is to consider sharding once a table exceeds about 5 million rows, though the exact threshold depends on workload.
Horizontal vs. Vertical Sharding
Vertical sharding separates columns into different tables (e.g., moving rarely used nickname and description to a detail table). Horizontal sharding distributes rows across multiple tables or databases based on a key (e.g., odd/even IDs, date ranges).
Daily tables store only the current day's data.
Monthly tables aggregate daily data, often moved by scheduled jobs.
History tables archive data older than a retention period.
Key characteristics:
Vertical: different schemas, field‑level split.
Horizontal: same schema, row‑level split.
Single‑DB vs. Multi‑DB Horizontal Sharding
Horizontal shards can reside in the same database (reduces cross‑DB joins but still faces storage limits) or be spread across multiple databases to overcome both query and storage bottlenecks.
Complexities Introduced by Sharding
Sharding solves performance issues but adds challenges:
Cross‑DB joins: Use field redundancy, ETL aggregation, global tables, or application‑level assembly.
Distributed transactions: Rely on reliable‑message queues, two‑phase commit, or flexible transaction patterns.
Sorting, pagination, function calculations: Execute on each shard then merge results.
Distributed IDs: Avoid auto‑increment collisions; options include UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, Didi Tinyid, etc.
Multiple data sources: Client‑side or proxy‑layer adapters; common middleware includes ShardingSphere (formerly sharding‑jdbc) and Mycat.
Conclusion
Before jumping to sharding, exhaust conventional optimization methods. Sharding brings significant complexity and should be adopted only when necessary, with careful foresight to avoid premature or over‑design.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
