Mastering Database Sharding: When and How to Split Databases and Tables
This article explains the three aspects of database sharding—database‑only, table‑only, and both—detailing when to apply each, how to choose sharding keys, common algorithms, ID generation strategies, available open‑source tools, and the new challenges introduced by sharding.
Database Sharding Overview
Sharding (分库分表) is not a single technique but three: only database sharding, only table sharding, and both together. Each solves a different problem—high concurrency, large data volume, or both.
When to shard databases?
When read/write QPS is high and the number of database connections becomes insufficient, adding more database instances (splitting databases) provides more connections and improves concurrency.
Typical scenarios include micro‑service decomposition, where business data is split into separate databases for orders, logistics, products, members, etc., and moving historical orders to an archive database.
When to shard tables?
Table sharding addresses large data volume. If a single table’s size (e.g., >500 000 rows or >2 GB) causes storage or query performance bottlenecks, splitting the table reduces row count per table and speeds up queries.
When to do both?
When both high QPS and large table size occur simultaneously, both database and table sharding are required.
Horizontal vs. Vertical Splitting
Horizontal (row‑based) splitting distributes different rows across multiple tables, reducing rows per table. Vertical (column‑based) splitting moves groups of columns into separate tables, reducing column count per table. Both can be combined, and vertical splitting also includes splitting by business domain into separate databases.
Choosing Sharding Keys
Common sharding keys are user ID, time, or region. For e‑commerce orders, using buyer ID avoids data skew that can occur with seller ID, because a large seller may generate many orders that would concentrate in a single shard.
When querying by buyer ID, the corresponding shard is directly accessed. For seller queries, a real‑time synchronized seller‑dimension table (e.g., via Binlog or Flink) can be used for read‑only access.
Order‑Based Queries
If an order number encodes the sharding result (the "gene method"), the system can parse the number to locate the correct shard without additional lookup.
Sharding Algorithms
Direct modulo: use integer modulo of the sharding field.
Hash modulo: hash a string field then modulo.
Consistent hashing: minimizes data movement when the number of shards changes.
Global ID Generation
Because auto‑increment IDs conflict across shards, several strategies are used:
UUID (not recommended for performance).
Centralized auto‑increment table (single point of failure).
Multiple tables with step ranges (e.g., each instance gets a 1000‑ID block).
Snowflake algorithm: 1‑bit sign, 41‑bit timestamp, 10‑bit worker ID, 12‑bit sequence, yielding up to 4 194 304 IDs per millisecond.
Sharding Tools
Sharding‑JDBC (now ShardingSphere): lightweight Java framework that works at the JDBC layer.
TDDL: Alibaba’s middleware offering sharding, read/write splitting, and dynamic datasource configuration.
Mycat: distributed relational middleware supporting SQL sharding and MySQL protocol.
Problems Introduced by Sharding
All read/write operations must carry the sharding key; otherwise full‑table scans across all physical tables are required. Cross‑database transactions are unsupported, leading to consistency challenges. Pagination, sorting, and other operations that rely on a single table become difficult.
Therefore, sharding should be considered only after other optimization techniques have been exhausted.
Conclusion
The article covered the motivations for sharding, when and how to apply database and table sharding, key selection, algorithms, ID generation methods, popular open‑source frameworks, and the trade‑offs introduced by sharding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
