Distributed ID Generation: Requirements, Schemes, and Implementations
This article explains why distributed systems need special ID generation, outlines the key requirements such as global uniqueness and monotonicity, and compares various solutions including UUID, database auto‑increment, segment mode, Redis, Snowflake, Baidu UidGenerator, Meituan Leaf, and Didi TinyID.
In monolithic applications primary‑key auto‑increment is sufficient, but once a system is sharded across multiple databases the same approach can produce duplicate IDs, which is unacceptable for business logic.
The main requirements for a distributed ID are:
Global uniqueness – no two IDs should ever collide.
Trend‑increasing – IDs should be roughly ordered to benefit MySQL’s clustered index.
Monotonic increasing – each new ID must be larger than the previous one for scenarios such as versioning.
Security – IDs should not be perfectly sequential to avoid easy data scraping.
Several generation schemes are commonly used:
UUID
Database auto‑increment with custom step/offset
Segment (range) mode
Redis INCR/INCRBY
Snowflake algorithm
Baidu UidGenerator
Meituan Leaf
Didi TinyID
UUID (Universally Unique Identifier) provides 128‑bit identifiers formatted as 8‑4‑4‑4‑12 hexadecimal characters. It is highly available and does not require a network call, but the 36‑character string is large, can expose MAC addresses, and its randomness may cause index fragmentation.
Database auto‑increment can be adapted for distributed use by assigning each node a distinct auto_increment_increment and auto_increment_offset . For example, with two nodes and a step of 2, one node generates odd IDs (1,3,5…) and the other even IDs (2,4,6…). This method preserves uniqueness but loses true monotonicity, adds database load, and requires re‑configuration when scaling.
Segment mode obtains a range of IDs from a central table and serves them from memory. The table schema is:
CREATE TABLE id_generator (
id int(10) NOT NULL,
max_id bigint(20) NOT NULL COMMENT '当前最大id',
step int(20) NOT NULL COMMENT '号段的步长',
biz_type int(20) NOT NULL COMMENT '业务类型',
version int(20) NOT NULL COMMENT '版本号',
PRIMARY KEY (`id`)
)When the range is exhausted, the service updates the table:
update id_generator set max_id = #{max_id+step}, version = version + 1 where version = #{version} and biz_type = XXXAdvantages: mature solution used by Baidu and Meituan. Disadvantages: depends on the database and requires careful version handling.
Redis implementation leverages atomic INCR and INCRBY commands. Redis’s single‑threaded nature guarantees uniqueness and ordering, offering high performance, but it introduces a Redis dependency and may need clustering for high concurrency.
Snowflake algorithm (originated by Twitter) splits a 64‑bit number into sign (1 bit), timestamp (41 bits), machine ID (10 bits), and sequence (12 bits). It yields trend‑increasing IDs without external services, but it is vulnerable to clock rollback, which can cause duplicate IDs.
Baidu UidGenerator builds on Snowflake with enhancements such as future‑time borrowing, RingBuffer caching, and configurable worker‑ID bits, achieving up to 6 million QPS per node while handling Docker‑style restarts.
Meituan Leaf offers two modes: segment and Snowflake. The segment variant improves the original approach by batch‑fetching ranges via a proxy server, reducing database pressure. Its segment table definition is:
+-------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+-------------------+-----------------------------+
| biz_tag | varchar(128) | NO | PRI | | |
| max_id | bigint(20) | NO | | 1 | |
| step | int(11) | NO | | NULL | |
| desc | varchar(256) | YES | | NULL | |
| update_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------+--------------+------+-----+-------------------+-----------------------------+Leaf’s Snowflake variant retains the 1+41+10+12 bit layout but assigns worker IDs automatically via Zookeeper sequential nodes, solving the manual configuration problem.
Didi TinyID is a Java‑based system that extends the segment algorithm, supporting multiple databases and a dedicated client library. It is easy to integrate but relies on database availability and typically requires master‑slave clustering.
Overall, the choice of a distributed ID solution depends on factors such as performance requirements, operational complexity, scalability, and security considerations.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.