Designing a High‑Availability Distributed ID Generator: From UUID to Snowflake

This article examines the requirements for globally unique IDs in distributed systems, compares classic generation schemes such as UUID, Flickr, Snowflake and TDDL, and details a customized Snowflake‑based implementation with ZooKeeper‑managed worker IDs, clock‑rollback handling, deployment optimizations, and JVM tuning to achieve high performance and reliability.

Dada Group Technology
Dada Group Technology
Dada Group Technology
Designing a High‑Availability Distributed ID Generator: From UUID to Snowflake

Background

Distributed systems require globally unique identifiers for data records, messages, and HTTP requests to support tracing, correlation, and routing. Auto‑increment primary keys are insufficient, so an ID generation service must guarantee uniqueness and high availability.

Classic Generation Schemes

1. UUID

UUID is a 128‑bit value usually rendered as a 36‑character string. It can be generated locally without remote calls, giving low latency, but its size and lack of ordering make it unsuitable for database indexes and for scenarios that need monotonically increasing numeric IDs.

2. Flickr Scheme

This approach uses MySQL auto‑increment IDs together with REPLACE INTO. By deploying at least two MySQL instances with different start values and step sizes, odd/even IDs are produced, providing ordered IDs but tying availability and performance to the database nodes.

3. Snowflake‑like Scheme

A 64‑bit integer is composed of three parts: 41 bits for a millisecond‑precision timestamp (≈69 years), 10 bits for a machine identifier (up to 1024 nodes), and 12 bits for a per‑node sequence (up to 4096 IDs per millisecond). Advantages are time‑ordered IDs, high throughput (millions per second), and flexible bit allocation. Drawbacks include reliance on synchronized clocks and possible non‑global monotonicity across nodes.

4. TDDL Sequence

Alibaba’s TDDL sharding middleware stores ID state in a database and allocates ID blocks to memory. Each business defines a named sequence, reducing write pressure compared to the Flickr scheme, but the service still depends heavily on database availability.

Chosen Implementation

To satisfy the requirements of numeric, globally unique, and monotonically increasing IDs, a customized Snowflake design was selected.

Custom Snowflake Bit Allocation

36‑bit timestamp (seconds precision)

5‑bit machine code

22‑bit sequence number

Machine Code Management

Machine codes must be unique across nodes. For small clusters they can be set manually; for larger deployments ZooKeeper persistent sequential nodes are used to allocate a WORKID automatically. Startup flow:

Start the ID service and connect to ZooKeeper; ensure the root node /id_generator exists.

Check whether a sequential child node for the current host already exists.

If it exists, read the assigned WORKID; otherwise create a new sequential node and use its sequence number as WORKID.

The obtained WORKID is cached locally, eliminating further ZooKeeper calls.

Clock‑Rollback Handling

Because the algorithm depends on the system clock, a backward time shift can cause duplicate IDs. The handling strategy includes:

Disable automatic NTP synchronization to prevent abrupt clock adjustments.

Detect clock rollback and refuse ID generation, returning an error code until the clock catches up.

If the rollback exceeds a configured tolerance, raise an alarm and remove the node from the cluster.

Cache recent second‑level timestamps and sequence numbers to mitigate short‑term rollbacks.

Leap‑Second Considerations

Negative leap seconds (23:59:58 → 00:00:00) do not affect ID generation. Positive leap seconds (23:59:59 → 23:59:60) are safe as long as the timestamp value remains unique; any subsequent NTP correction is handled by the same rollback logic.

Service Deployment Optimization

Cluster Architecture

The service runs in a horizontally scaled cluster behind an Nginx load balancer. Each instance is a Spring Boot application with an embedded Tomcat server; health checks are performed via heartbeat signals.

Tomcat Tuning

APR (Apache Portable Runtime) is enabled to replace BIO/NIO, providing superior asynchronous I/O performance. Native APR libraries must be installed and Spring Boot configuration adjusted to activate APR.

Development Issues and JVM Tuning

During testing, occasional Full GC events were observed, traced to Metaspace exhaustion. The default -XX:MetaspaceSize was too low, causing Full GC despite ample overall Metaspace capacity. Increasing the initial Metaspace size, e.g., -XX:MetaspaceSize=128m, eliminated the Full GC spikes.

Summary

The distributed ID generation system implements a customized Snowflake algorithm that provides globally unique, time‑ordered numeric IDs with high availability through cluster deployment, ZooKeeper‑based worker ID allocation, and robust clock‑rollback handling.

References

Snowflake – https://github.com/twitter/snowflake

TDDL documentation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed Systemsperformancehigh availabilitysnowflakeID generation
Dada Group Technology
Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.