How to Choose the Right Distributed Unique ID Strategy for High‑Performance Systems
This article examines why distributed systems need globally unique, trend‑increasing IDs, outlines strict generation requirements and availability goals, compares common approaches such as UUID, database auto‑increment, Redis, and the Snowflake algorithm, and provides practical implementation details and trade‑offs.
Problem
In large distributed systems a massive amount of data and messages need globally unique identifiers (e.g., orders, riders, coupons, sharded tables).
Hard requirements for ID generation
Global uniqueness : no duplicate IDs.
Trend‑increasing : IDs should be roughly ordered to keep B‑Tree index inserts efficient.
Monotonic increase : each new ID larger than the previous one for versioning or sorting.
Security : non‑sequential IDs make it harder for attackers to guess volumes.
Timestamp inclusion : embedding a timestamp allows quick determination of generation time.
Availability requirements for an ID service
High availability : 99.999% success rate.
Low latency : fast response.
High QPS : e.g., 100 000 IDs per second.
Common solutions
1. UUID
JDK provides a 36‑character string (8‑4‑4‑4‑12). Advantages: high performance, local generation, no network cost. Disadvantages: unordered, large storage footprint, hurts B‑Tree index performance; MySQL recommends short primary keys.
2. Database auto‑increment primary key
Implemented via REPLACE INTO. Not suitable for distributed ID generation because horizontal scaling is difficult and database load becomes a bottleneck, violating low‑latency and high‑QPS requirements.
3. Redis‑based global ID
Redis guarantees atomicity with INCR and INCRBY. In a Redis cluster each node can be assigned a different step size and a TTL. Example with a 5‑node cluster, initial values 1‑5 and step 5 yields IDs:
A: 1, 6, 11, …
B: 2, 7, 12, …
C: 3, 8, 13, …
D: 4, 9, 14, …
E: 5, 10, 15, …
4. Snowflake algorithm
Twitter’s distributed auto‑increment ID algorithm. Repository: https://github.com/twitter-archive/snowflake
Features:
Time‑ordered generation.
64‑bit integer (max 19‑digit decimal).
No collisions across data centers and workers; high efficiency.
Structure
Bit allocation (most significant to least):
1 sign bit (always 0).
41 bits for timestamp (allows ~69 years from a custom epoch).
5 bits for datacenter ID and 5 bits for worker ID (max 1024 nodes).
12 bits for sequence number (max 4095 IDs per millisecond per node).
Reference implementation (Java)
/**
* Twitter_Snowflake
* 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000
* 1 bit sign, 41 bits timestamp, 5 bits datacenter, 5 bits worker, 12 bits sequence.
*/
public class SnowflakeIdWorker {
private final long twepoch = 1598598185157L; // custom epoch
private final long workerIdBits = 5L;
private final long datacenterIdBits = 5L;
private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
private final long sequenceBits = 12L;
private final long workerIdShift = sequenceBits;
private final long datacenterIdShift = sequenceBits + workerIdBits;
private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
private final long sequenceMask = -1L ^ (-1L << sequenceBits);
private long workerId;
private long datacenterId;
private long sequence = 0L;
private long lastTimestamp = -1L;
public SnowflakeIdWorker(long workerId, long datacenterId) {
if (workerId > maxWorkerId || workerId < 0) {
throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
}
if (datacenterId > maxDatacenterId || datacenterId < 0) {
throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
}
this.workerId = workerId;
this.datacenterId = datacenterId;
}
public synchronized long nextId() {
long timestamp = timeGen();
if (timestamp < lastTimestamp) {
throw new RuntimeException(String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
}
if (lastTimestamp == timestamp) {
sequence = (sequence + 1) & sequenceMask;
if (sequence == 0) {
timestamp = tilNextMillis(lastTimestamp);
}
} else {
sequence = 0L;
}
lastTimestamp = timestamp;
long id = ((timestamp - twepoch) << timestampLeftShift)
| (datacenterId << datacenterIdShift)
| (workerId << workerIdShift)
| sequence;
return id;
}
protected long tilNextMillis(long lastTimestamp) {
long timestamp = timeGen();
while (timestamp <= lastTimestamp) {
timestamp = timeGen();
}
return timestamp;
}
protected long timeGen() {
return System.currentTimeMillis();
}
public static void main(String[] args) {
SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
for (int i = 0; i < 1000; i++) {
long id = idWorker.nextId();
System.out.println(id);
}
}
}Pros and cons
Advantages: Timestamp in high bits ensures trend‑increasing IDs; no reliance on external databases; high performance; flexible bit allocation.
Disadvantages: Relies on synchronized machine clocks; clock rollback can cause duplicate IDs; global monotonicity is not guaranteed across nodes, though most use‑cases only need trend‑increase.
Mitigating clock issues
Open‑source solutions for clock synchronization:
UidGenerator from Baidu.
Leaf – Meituan‑Dianping's distributed ID system.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
