Designing a High‑Availability Distributed ID Generator: From UUID to Snowflake
This article examines the requirements for globally unique IDs in distributed systems, compares classic generation schemes such as UUID, Flickr, Snowflake and TDDL, and details a customized Snowflake‑based implementation with ZooKeeper‑managed worker IDs, clock‑rollback handling, deployment optimizations, and JVM tuning to achieve high performance and reliability.
Background
Distributed systems require globally unique identifiers for data records, messages, and HTTP requests to support tracing, correlation, and routing. Auto‑increment primary keys are insufficient, so an ID generation service must guarantee uniqueness and high availability.
Classic Generation Schemes
1. UUID
UUID is a 128‑bit value usually rendered as a 36‑character string. It can be generated locally without remote calls, giving low latency, but its size and lack of ordering make it unsuitable for database indexes and for scenarios that need monotonically increasing numeric IDs.
2. Flickr Scheme
This approach uses MySQL auto‑increment IDs together with REPLACE INTO. By deploying at least two MySQL instances with different start values and step sizes, odd/even IDs are produced, providing ordered IDs but tying availability and performance to the database nodes.
3. Snowflake‑like Scheme
A 64‑bit integer is composed of three parts: 41 bits for a millisecond‑precision timestamp (≈69 years), 10 bits for a machine identifier (up to 1024 nodes), and 12 bits for a per‑node sequence (up to 4096 IDs per millisecond). Advantages are time‑ordered IDs, high throughput (millions per second), and flexible bit allocation. Drawbacks include reliance on synchronized clocks and possible non‑global monotonicity across nodes.
4. TDDL Sequence
Alibaba’s TDDL sharding middleware stores ID state in a database and allocates ID blocks to memory. Each business defines a named sequence, reducing write pressure compared to the Flickr scheme, but the service still depends heavily on database availability.
Chosen Implementation
To satisfy the requirements of numeric, globally unique, and monotonically increasing IDs, a customized Snowflake design was selected.
Custom Snowflake Bit Allocation
36‑bit timestamp (seconds precision)
5‑bit machine code
22‑bit sequence number
Machine Code Management
Machine codes must be unique across nodes. For small clusters they can be set manually; for larger deployments ZooKeeper persistent sequential nodes are used to allocate a WORKID automatically. Startup flow:
Start the ID service and connect to ZooKeeper; ensure the root node /id_generator exists.
Check whether a sequential child node for the current host already exists.
If it exists, read the assigned WORKID; otherwise create a new sequential node and use its sequence number as WORKID.
The obtained WORKID is cached locally, eliminating further ZooKeeper calls.
Clock‑Rollback Handling
Because the algorithm depends on the system clock, a backward time shift can cause duplicate IDs. The handling strategy includes:
Disable automatic NTP synchronization to prevent abrupt clock adjustments.
Detect clock rollback and refuse ID generation, returning an error code until the clock catches up.
If the rollback exceeds a configured tolerance, raise an alarm and remove the node from the cluster.
Cache recent second‑level timestamps and sequence numbers to mitigate short‑term rollbacks.
Leap‑Second Considerations
Negative leap seconds (23:59:58 → 00:00:00) do not affect ID generation. Positive leap seconds (23:59:59 → 23:59:60) are safe as long as the timestamp value remains unique; any subsequent NTP correction is handled by the same rollback logic.
Service Deployment Optimization
Cluster Architecture
The service runs in a horizontally scaled cluster behind an Nginx load balancer. Each instance is a Spring Boot application with an embedded Tomcat server; health checks are performed via heartbeat signals.
Tomcat Tuning
APR (Apache Portable Runtime) is enabled to replace BIO/NIO, providing superior asynchronous I/O performance. Native APR libraries must be installed and Spring Boot configuration adjusted to activate APR.
Development Issues and JVM Tuning
During testing, occasional Full GC events were observed, traced to Metaspace exhaustion. The default -XX:MetaspaceSize was too low, causing Full GC despite ample overall Metaspace capacity. Increasing the initial Metaspace size, e.g., -XX:MetaspaceSize=128m, eliminated the Full GC spikes.
Summary
The distributed ID generation system implements a customized Snowflake algorithm that provides globally unique, time‑ordered numeric IDs with high availability through cluster deployment, ZooKeeper‑based worker ID allocation, and robust clock‑rollback handling.
References
Snowflake – https://github.com/twitter/snowflake
TDDL documentation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
