Databases 12 min read

Mastering MySQL Sharding: Strategies, ID Generation, and Seamless Scaling

This article explains why and how to apply database sharding, introduces key terminology, compares global ID generation methods such as auto‑increment, UUID, COMB and Snowflake, outlines sharding algorithms, discusses challenges like distributed transactions, and presents practical expansion and implementation solutions.

Programmer DD

Jul 20, 2022

Mastering MySQL Sharding: Strategies, ID Generation, and Seamless Scaling

1. Overview of Database Sharding

When a single database can no longer handle data volume or concurrency, splitting data across multiple databases and tables (sharding) becomes necessary.

1.1 Sharding Terminology

Read‑write separation: different databases handle reads and writes.

Partitioning: split records into different physical partitions on the same server.

Database sharding: store tables across multiple database instances.

Table sharding: vertical (different fields in separate tables) and horizontal (different rows in separate tables).

1.2 Should You Shard?

Sharding adds complexity and performance overhead; only adopt it when traffic is truly massive. Before sharding, consider increasing disk, adding databases, upgrading hardware, read‑write separation, optimizing schema, indexes, SQL, partitioning, or vertical splitting.

2. Global ID Generation Strategies

2.1 Auto‑Increment Columns

Pros: built‑in, ordered, fast. Cons: may duplicate IDs after sharding if not planned.

2.1.1 Set Auto‑Increment Offset and Step

### Assume 10 shards
### Level options: SESSION (session‑level), GLOBAL (global)
SET @@SESSION.auto_increment_offset = 1;   ### start value 1‑10
SET @@SESSION.auto_increment_increment = 10; ### step size

Data migration is required when expanding shards.

2.1.2 Global ID Mapping Table

Store a key in a global Redis for each table that tracks the current max ID; each request increments and returns the ID, persisting Redis to a database.

2.2 UUID (128‑bit)

Universally unique identifiers generated by the platform; easy to use but large, unordered, and slower.

2.3 COMB (Combined GUID)

Combines a GUID with a timestamp to achieve ordering and improve index performance.

2.4 Snowflake Algorithm

Twitter’s distributed ID generator producing 64‑bit numbers that are roughly time‑ordered and unique without coordination.

1 bit: sign (always 0).

41 bits: timestamp in milliseconds (≈69 years).

10 bits: node ID (5‑bit data center + 5‑bit machine, up to 1024 nodes).

12 bits: sequence number (4096 IDs per millisecond per node).

3. Sharding Strategies

3.1 Range Sharding

Assign records to nodes based on a field range (e.g., user ID, order time). Easy to add new ranges without data migration, but may cause hotspot imbalance.

3.2 Consistent Hashing

Distributes data uniformly; adding nodes does not require moving existing data.

3.3 Modulo Sharding

Uses ID % N to select a shard; simple but requires data migration when N changes.

3.4 Snowflake‑Based Sharding

Leverages Snowflake IDs to achieve balanced distribution without migration.

4. Issues Introduced by Sharding

4.1 Distributed Transactions

Two‑phase or three‑phase commits incur high overhead; compensation mechanisms are preferred.

4.2 Cross‑Node JOIN

Avoid MySQL’s native cross‑database JOIN; use global tables, field duplication, or application‑side assembly.

4.3 Cross‑Node Aggregation

Must be performed in the application; pagination after large aggregations can be inefficient.

4.4 Node Expansion

Changing shard rules after adding nodes requires data migration.

5. Node Expansion Solutions

5.1 Conventional Expansion

Estimate migration time and announce downtime.

Stop service, run migration scripts.

Apply new sharding rules.

Restart servers.

5.2 Migration‑Free Expansion

Double the number of nodes and adjust sharding formula (e.g., from ID%2 to ID%4) while keeping both old and new nodes active, then gradually clean up redundant data.

6. Sharding Implementation Options

6.1 Proxy Layer

Deploy a proxy (e.g., MyCAT) that masquerades as MySQL, handling routing, read‑write separation, sharding, and multi‑tenant features.

6.2 Application Layer

Integrate a library/JAR (e.g., Sharding‑JDBC) that intercepts JDBC calls, providing transparent sharding, support for multiple databases, and flexible sharding algorithms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding MySQL Snowflake database scaling ID Generation

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.