Database Sharding: Concepts, Global ID Strategies, Partitioning Schemes, and Expansion Solutions
This article explains database sharding fundamentals, including terminology, when to adopt sharding, various global ID generation methods such as auto‑increment, UUID, COMB, and Snowflake, different partitioning strategies, challenges like distributed transactions, and practical expansion and migration solutions.
1. Sharding Overview
When a single‑instance database cannot handle large data volume or high concurrency, sharding (splitting databases and tables) is considered to overcome the limitations.
1.1 Sharding Terminology
Read‑Write Separation: Different databases handle reads and writes respectively.
Partition: Records are divided into different physical partitions on the same server.
Database Sharding: Multiple tables of a system are stored across several database instances.
Table Sharding: Vertical Sharding: Different fields are stored in separate tables. Horizontal Sharding (most complex): Records are split across tables according to a sharding algorithm.
1.2 Should You Adopt Sharding?
Sharding introduces complexity and performance overhead; it should only be used when the projected workload truly requires it. Before sharding, consider: If current data size is below a few million rows, sharding is usually unnecessary. Increase storage or add more databases to handle data‑size issues. Upgrade CPU/Memory, enable read‑write separation, tune MySQL configuration, optimize tables, indexes, SQL, use partitioning or vertical table splitting for performance problems. Only if these measures fail, consider horizontal sharding.
2. Global ID Generation Strategies
2.1 Auto‑Increment Column
Advantages: built‑in, ordered, good performance. Drawbacks: in a sharded environment IDs may collide if not planned.
2.1.1 Set Auto‑Increment Offset and Step
### 假设总共有 10 个分表
### 级别可选: SESSION(会话级), GLOBAL(全局)
SET @SESSION.auto_increment_offset = 1; ### 起始值, 分别取值为 1~10
SET @SESSION.auto_increment_increment = 10; ### 步长增量When using this method, data must be migrated to new shards during expansion.
2.1.2 Global ID Mapping Table
Store a key in a global Redis for each table that records the current maximum ID; each request increments the key and returns the new ID. The Redis data should be persisted to a durable store.
2.2 UUID (128‑bit)
Universally unique identifiers generated by the platform. Format example: 550e8400‑e29b‑41d4‑a716‑446655440000. Advantages: simple and globally unique. Disadvantages: large storage/transmission size, unordered, performance impact.
2.3 COMB (Combined GUID)
Combines a 10‑byte GUID with a 6‑byte timestamp to produce an ordered identifier, improving index performance.
2.4 Snowflake Algorithm
Twitter's open‑source distributed ID generator producing a 64‑bit integer. Structure (excluding the sign bit): 41 bits: timestamp in milliseconds (covers ~69 years). 10 bits: node identifier (5‑bit data‑center + 5‑bit machine, supporting 1024 nodes). 12 bits: sequence number (supports 4096 IDs per millisecond per node).
3. Sharding Strategies
3.1 Range Sharding
Data is divided based on a range of a specific field (e.g., user ID or order time). Advantage: after adding new nodes, only the new range needs to be assigned, avoiding data migration. Disadvantage: time‑based ranges can cause hotspot imbalance.
3.2 Consistent Hashing
Uses a hash ring to map keys to nodes; adding or removing nodes requires minimal data movement.
3.3 Modulo Sharding
Assigns records based on ID % N. Simple but requires data migration when N changes.
3.4 Snowflake Sharding
Leverages Snowflake's node ID bits to route records, achieving expansion without data migration.
4. Problems Introduced by Sharding
4.1 Distributed Transactions
Two‑phase/three‑phase commit incurs high performance cost; compensation mechanisms are often preferred.
4.2 Cross‑Node JOIN
MySQL supports JOIN on a single instance; for multi‑node joins, avoid using native JOIN and consider alternatives such as global tables, field redundancy, or application‑side assembly.
4.3 Cross‑Node Aggregation
Must be performed in the application layer; large aggregations followed by pagination can be inefficient.
4.4 Node Expansion
Adding nodes changes shard ownership, requiring data migration.
5. Node Expansion Plans
5.1 Conventional Expansion
When adding nodes without careful planning, most data will need to be moved. Typical steps: Estimate migration time and announce downtime. Stop services, run migration scripts. Update sharding rules. Restart services.
5.2 Migration‑Free Expansion
Double‑capacity strategy: each existing node (A, B) adds a replica (A2, B2) and switches to a new modulo rule. Add A2 and B2 as read‑only replicas and sync data. Change sharding rule from ID%2=0 => A, ID%2=1 => B to ID%4=0 => A, ID%4=2 => A2, ID%4=1 => B, ID%4=3 => B2 . Remove replication and let all four nodes serve traffic; redundant data can be cleaned later.
6. Sharding Implementation Options
6.1 Proxy Layer
Deploy a proxy server (e.g., MyCAT) that masquerades as a MySQL instance, routing queries to real backend nodes. The proxy is transparent to applications and supports read‑write separation, sharding, multi‑tenant, and cloud‑native scenarios.
6.2 Application Layer
Integrate a library/JAR into the application code. Examples: Sharding‑JDBC (active) provides lightweight JDBC‑level sharding, supports Snowflake algorithm, flexible sharding keys, and full SQL parsing (joins, aggregates, limits, etc.). Other historic solutions (e.g., TDDL) are no longer maintained. Sharding‑JDBC works with any Java ORM (JPA, Hibernate, MyBatis) and connection pool (Druid, C3P0, etc.).
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.