Mastering Database Sharding: Strategies, Global ID Generation, and Seamless Scaling
Database bottlenecks often require sharding; this guide explains core concepts of database partitioning, global ID generation methods, shard allocation strategies, smooth scaling techniques, and popular solutions, helping engineers decide when and how to implement effective sharding without premature over‑design.
1. Overview of Database Sharding
Databases can become performance bottlenecks in high‑concurrency distributed systems. When a single instance cannot handle the load, sharding—splitting data across multiple databases and tables—breaks the single‑machine limitation.
Read‑Write Separation: Separate instances for reads and writes.
Partitioning: Divide records into different physical areas while keeping a logical single table.
Database Sharding: Store different tables on different database instances.
Table Sharding:
Vertical sharding: split columns across tables.
Horizontal sharding: split rows across tables using a sharding algorithm.
2. When to Adopt Sharding
Sharding adds complexity and performance overhead. It should be considered only when projected traffic is large enough to justify the effort.
If current data volume is under a few million rows, sharding is usually unnecessary.
For storage limits, add disks or additional databases.
For performance limits, upgrade CPU/memory, enable read‑write separation, tune DB configuration, optimize tables/indexes/SQL, use partitioning or vertical table splitting.
If these measures fail, consider horizontal table sharding.
3. Global ID Generation Strategies
3.1 Auto‑Increment Column
Pros: Built‑in, ordered, high performance.
Cons: May produce duplicate IDs across shards if not planned.
Solution: set offset and step size per shard.
SET @@SESSION.auto_increment_offset=1; /* start value 1‑10 */
SET @@SESSION.auto_increment_increment=10; /* step */During expansion, migrate existing data to the new shard.
3.2 Global ID Mapping Table (Redis)
Create a Redis key for each table that stores the current maximum ID. Each request increments the key atomically and returns the new ID. The Redis data should be persisted to a durable store.
3.3 UUID (128‑bit)
Universally unique identifiers generated by platform APIs. Example: 550e8400-e29b-41d4-a716-446655440000 Pros: Simple, globally unique.
Cons: Large storage/transmission size, unordered, performance impact.
3.4 COMB (Combined GUID)
Combines a 10‑byte GUID with a 6‑byte timestamp to produce ordered IDs, improving index performance.
3.5 Snowflake Algorithm
Twitter’s open‑source 64‑bit distributed ID generator. Bits allocation (default):
1 bit sign (always 0)
41 bits timestamp (milliseconds, ~69 years)
10 bits node ID (5‑bit datacenter + 5‑bit machine, up to 1024 nodes)
12 bits sequence per millisecond (up to 4096 IDs per node per ms)
Diagram of Snowflake structure:
4. Sharding Strategies
4.1 Range Sharding
Assign records to nodes based on a field range (e.g., user ID, order date).
Pros: Adding a new range for a new node requires no data migration.
Cons: Hot‑spot risk if recent data concentrates on a few nodes.
4.2 Modulo Sharding
Distribute rows by computing id % N. Expansion requires data migration.
4.3 Consistent Hashing
Maps keys to a ring; adding nodes does not require moving existing data.
Pros: No data migration on expansion.
4.4 Snowflake‑Based Sharding
Uses Snowflake IDs as shard keys; also supports expansion without migration.
Pros: No migration needed when adding nodes.
5. Problems Introduced by Sharding
Distributed Transactions: Two‑phase/three‑phase commits are costly; compensation mechanisms are preferred.
Cross‑Node JOIN: Avoid MySQL’s multi‑node JOIN; use global tables, field redundancy, or application‑side assembly.
Cross‑Node Aggregation: Must be performed in the application; pagination after large aggregations hurts performance.
Node Expansion: Changing shard rules often requires data migration.
6. Node Expansion Solutions
6.1 Conventional Expansion
When adding nodes without prior planning, most data will need to move.
Estimate migration time and publish downtime notice.
Stop service, run migration scripts, update shard rules.
Restart services.
6.2 Migration‑Free Expansion (Double‑Capacity Strategy)
Assume two existing nodes A and B; expand to A, A2, B, B2.
Keep services running.
Add two replica databases A2 and B2; set up master‑slave sync (A→A2, B→B2).
Adjust shard rule:
Old id % 2 = 0 → A becomes id % 4 = 0, new A2 handles id % 4 = 2.
Old id % 2 = 1 → B becomes id % 4 = 1, new B2 handles id % 4 = 3.
Break master‑slave links after sync completes.
Redundant data can be cleaned later without affecting service.
7. Sharding Implementation Approaches
7.1 Proxy Layer (e.g., MyCAT)
Deploy a proxy server that masquerades as a MySQL instance; the proxy forwards queries to real backend nodes. MyCAT supports many databases (MySQL, SQLServer, Oracle, PostgreSQL, MongoDB) and offers read‑write separation, sharding, disaster recovery, multi‑tenant support, and cloud‑native features.
7.2 Application Layer
Integrate a Java library that intercepts JDBC calls.
Sharding‑JDBC: Lightweight, client‑side driver, no extra deployment, compatible with any Java ORM (JPA, Hibernate, MyBatis, Spring JDBC). Supports equal, BETWEEN, IN, and multi‑key sharding, as well as SQL aggregation, grouping, ordering, and limit.
Historical TDDL (Taobao): No longer maintained; not recommended.
8. References
沈剑, 《数据库秒级平滑扩容架构方案》 (WeChat article)
Distributed transaction solutions: https://kefeng.wang/2018/03/01/distributed-transaction/
The Cost of GUIDs as Primary Keys: http://www.informit.com/articles/article.aspx?p=25862
Twitter Snowflake source: https://github.com/twitter-archive/snowflake/tree/snowflake-2010
Snowflake algorithm details: https://www.lanindex.com/twitter-snowflake%EF%BC%8C64%E4%BD%8D%E8%87%AA%E5%A2%9Eid%E7%AE%97%E6%B3%95%E8%AF%A6%E8%A7%A3/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
