Backend Development 13 min read

How to Choose the Right Distributed ID Generation Strategy for Scalable Systems

This article examines common distributed ID generation methods—including UUID, MySQL auto‑increment, multi‑instance auto‑increment, Snowflake, and Redis—detailing their advantages, drawbacks, and practical use cases, and presents advanced designs such as database‑driven ID blocks, concurrency handling, and double‑buffer techniques for high‑traffic systems.

Programmer DD

Mar 31, 2021

How to Choose the Right Distributed ID Generation Strategy for Scalable Systems

Introduction

In distributed systems, large‑scale tables (e.g., user or order tables) are often split into multiple databases and tables. This introduces the problem of generating unique primary‑key IDs that must be globally unique, numeric, trend‑increasing, and short for efficient queries.

What is Increment?

Increment means each newly generated ID is larger than the previous one (e.g., 12 → 13 → 14).

What is Trend Increment?

Trend increment means IDs increase over time intervals (e.g., IDs in [0,1000] then later IDs in [1000,2000]), though within an interval the order may not be strictly sequential.

2. Common Distributed ID Generation Schemes

2.1 UUID

Pros:

Simple to implement.

Generated locally with no performance impact.

Globally unique, easing data migration.

Cons:

IDs are unordered, cannot guarantee trend increment.

String storage leads to slower queries.

Large storage footprint.

IDs lack business meaning and are unreadable.

Typical use cases: token generation; not suitable where trend‑increment is required.

2.2 MySQL Auto‑Increment Primary Key

Pros:

Numeric and naturally incrementing.

High query efficiency.

Some business readability.

Cons:

Single‑point failure: if MySQL goes down, ID generation stops.

Database becomes a bottleneck under high concurrency.

2.3 MySQL Multi‑Instance Auto‑Increment

Sets a step size on auto‑increment to avoid single‑point issues (e.g., start values 1,2,3… with step = N).

Pros: solves single‑point problem. Cons: fixed step limits scalability; each DB still faces high load.

2.4 Snowflake Algorithm

Generates 64‑bit integers composed of:

1‑bit sign (always 0).

41‑bit timestamp (difference from a custom epoch).

10‑bit machine identifier (e.g., 5‑bit datacenter + 5‑bit machine).

12‑bit sequence within the same millisecond (supports 4096 IDs per ms per node).

Pros:

Can produce up to 409.6 k IDs per second; high performance.

High‑order bits are timestamps, yielding trend‑increment IDs.

Bit allocation is configurable for different business needs.

Cons:

Relies on accurate system clocks; clock rollback can cause duplicate IDs.

Clock rollback of ~10 ms is common in distributed environments and can lead to ID collisions.

2.5 Redis‑Based Generation

Uses Redis atomic INCR operation. Typical format: year + dayOfYear + hour + redisIncrement.

Pros:

Ordered, readable IDs.

Cons:

Network latency for each request; heavy reliance on Redis availability.

需求：同时10万个请求获取ID1、并发执行完耗时：9s左右
2、单任务平均耗时：74ms
3、单线程最小耗时：不到1ms
4、单线程最大耗时：4.1s

Performance is acceptable for moderate loads, but IDs do not start from 1 nor guarantee strict trend increment without adjustments.

3. Advanced Designs Used by Large‑Scale Companies

3.1 Database Primary‑Key Block Allocation

Instead of fetching a single ID, the service requests a block (e.g., step = 1000) from the database, receiving a max_id. The service then generates IDs locally from max_id+1 to max_id+step, reducing DB load and allowing custom start points and step sizes.

3.2 Concurrency Control

When multiple services request blocks simultaneously, race conditions can cause duplicate max_id. Solutions include distributed locks or leveraging the database’s row‑level locking to ensure only one service updates the counter at a time.

3.3 Burst Blocking Issue

If many services exhaust their blocks concurrently, only one can acquire a new block while others wait, causing occasional latency spikes.

3.4 Double‑Buffer Scheme

Two buffers (buffer1 and buffer2) hold ID blocks. When the current buffer reaches a low‑water mark (e.g., 10 % remaining), a background thread fetches the next block into the idle buffer. Once the active buffer is depleted, the system switches to the prepared buffer, smoothing out burst requests.

This approach keeps ID generation in JVM memory, tolerates database outages longer, and mitigates sudden latency spikes.

Conclusion

The article presented several distributed ID generation techniques, from simple UUIDs to high‑performance Snowflake and Redis solutions, and discussed how large enterprises enhance reliability and scalability through block allocation, concurrency safeguards, and double‑buffer mechanisms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis database sharding Snowflake Distributed ID

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.