Why UUID Falls Short and How Snowflake Solves Distributed ID Generation
The article examines the limitations of using UUIDs for distributed unique identifiers, compares common alternatives such as database auto‑increment and Redis, and then details the Snowflake algorithm’s structure, implementation, advantages, and drawbacks for high‑performance ID generation.
Problem
In complex distributed systems a massive amount of data and messages need globally unique identifiers. Scenarios include finance, payment, restaurant, hotel services, movie platforms, and order, rider, coupon systems. A system that can generate globally unique IDs is therefore essential.
ID Generation Hard Requirements
Globally unique : No duplicate IDs.
Trend‑increasing : Use ordered primary keys to keep InnoDB B‑tree insert performance high.
Monotonically increasing : The next ID must be larger than the previous one for versioning, sorting, etc.
Security : Random‑looking IDs make it harder for attackers to guess order volumes.
Timestamp embedded : Allows developers to quickly infer when an ID was generated.
ID Service Availability Requirements
High availability : 99.999% of requests must return an ID.
Low latency : ID generation must be fast.
High QPS : The service should sustain tens of thousands of IDs per second (e.g., 100 k IDs/s).
Common Solutions
(1) UUID
Generated by the JDK, 36‑character string in the form 8‑4‑4‑4‑12. Suitable for single‑node uniqueness.
Pros: high performance, generated locally, no network cost.
Cons: unordered, long string, increases DB storage and degrades insert performance; MySQL recommends short primary keys.
Because UUIDs are unordered, each insert causes large B+‑tree modifications, node splits, and many under‑filled nodes, dramatically reducing database insert throughput.
(2) Database Auto‑Increment Primary Key
Implemented via REPLACE INTO. This approach is not suitable for distributed ID generation because:
Horizontal scaling is difficult; adding a new machine requires redefining step sizes and initial values, which becomes a nightmare with dozens or hundreds of nodes.
The database becomes a bottleneck: every ID request incurs a read‑write round‑trip, violating low‑latency and high‑QPS requirements.
(3) Redis Global ID Strategy
Redis guarantees atomicity with single‑threaded execution; INCR and INCRBY can be used.
In a Redis cluster, different step sizes must be configured for each shard, and keys should have an expiration.
Using a 5‑node Redis cluster, initialize each node with values 1, 2, 3, 4, 5 and a step size of 5. The generated IDs are:
A: 1, 6, 11, 16, 21
B: 2, 7, 12, 17, 22
C: 3, 8, 13, 18, 23
D: 4, 9, 14, 19, 24
E: 5, 10, 15, 20, 25
Snowflake
(1) Overview
Twitter’s distributed auto‑increment ID algorithm. Repository: https://github.com/twitter-archive/snowflake
Generates time‑ordered IDs.
Result is a 64‑bit integer (max 19‑digit decimal string).
No collisions across the distributed system (datacenter and worker IDs differentiate nodes) and high efficiency.
(2) Structure
1 sign bit (always 0 for positive IDs).
41 bits timestamp (millisecond offset from a custom epoch, supports ~69 years).
5 bits datacenter ID (max 31).
5 bits worker ID (max 31).
12 bits sequence number (max 4095) for IDs generated within the same millisecond.
(3) Code
/**
* Twitter_Snowflake
* SnowFlake structure (each part separated by '-'):
* 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000
* 1 sign bit (0 for positive numbers).
* 41‑bit timestamp (millisecond offset from a custom epoch).
* 10‑bit machine identifier (5‑bit datacenter + 5‑bit worker).
* 12‑bit sequence within the same millisecond (supports 4096 IDs per ms).
* Total 64‑bit Long value.
* SnowFlake can generate ~260k IDs per second in tests.
*/
public class SnowflakeIdWorker {
// ==============================Fields===========================================
/** start epoch (2020‑08‑28) */
private final long twepoch = 1598598185157L;
/** number of bits for worker id */
private final long workerIdBits = 5L;
/** number of bits for datacenter id */
private final long datacenterIdBits = 5L;
/** max worker id (31) */
private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
/** max datacenter id (31) */
private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
/** bits for sequence */
private final long sequenceBits = 12L;
/** left shift for worker id */
private final long workerIdShift = sequenceBits;
/** left shift for datacenter id */
private final long datacenterIdShift = sequenceBits + workerIdBits;
/** left shift for timestamp */
private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
/** mask for sequence (4095) */
private final long sequenceMask = -1L ^ (-1L << sequenceBits);
/** worker id (0~31) */
private long workerId;
/** datacenter id (0~31) */
private long datacenterId;
/** current sequence (0~4095) */
private long sequence = 0L;
/** last timestamp */
private long lastTimestamp = -1L;
//==============================Constructors=====================================
public SnowflakeIdWorker(long workerId, long datacenterId) {
if (workerId > maxWorkerId || workerId < 0) {
throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
}
if (datacenterId > maxDatacenterId || datacenterId < 0) {
throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
}
this.workerId = workerId;
this.datacenterId = datacenterId;
}
// ==============================Methods==========================================
/**
* Core method – get next ID (thread‑safe)
* @return Snowflake ID
*/
public synchronized long nextId() {
// 1. get current timestamp
long timestamp = timeGen();
// clock moved backwards?
if (timestamp < lastTimestamp) {
throw new RuntimeException(String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
}
// same millisecond?
if (lastTimestamp == timestamp) {
// increment sequence and mask overflow
sequence = (sequence + 1) & sequenceMask;
// sequence overflow – wait for next millisecond
if (sequence == 0) {
timestamp = tilNextMillis(lastTimestamp);
}
} else {
// new millisecond – reset sequence
sequence = 0L;
}
// update last timestamp
lastTimestamp = timestamp;
// assemble 64‑bit ID
long id = ((timestamp - twepoch) << timestampLeftShift)
| (datacenterId << datacenterIdShift)
| (workerId << workerIdShift)
| sequence;
return id;
}
/** Block until next millisecond */
protected long tilNextMillis(long lastTimestamp) {
long timestamp = timeGen();
while (timestamp <= lastTimestamp) {
timestamp = timeGen();
}
return timestamp;
}
/** Current time in milliseconds */
protected long timeGen() {
return System.currentTimeMillis();
}
//==============================Test=============================================
public static void main(String[] args) {
SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
for (int i = 0; i < 1000; i++) {
long id = idWorker.nextId();
System.out.println(id);
}
}
}(4) Pros and Cons
Advantages : Timestamp occupies high bits and sequence occupies low bits, making the ID trend‑increasing. No reliance on databases or third‑party systems; can be deployed as a service with high stability and performance. Bit allocation is flexible to match business needs.
Disadvantages : Depends on machine clocks; if a clock drifts backward, duplicate IDs may appear. In distributed environments clocks are not perfectly synchronized, so global monotonicity is not guaranteed—though most use cases only require trend‑increasing IDs.
Mitigation : Synchronize clocks using open‑source solutions such as Baidu’s UidGenerator or Meituan‑Dianping’s Leaf distributed ID generator.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IoT Full-Stack Technology
Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
