How to Generate Globally Unique IDs in Distributed Systems: Snowflake and Its Variants
This article explains the challenges of generating globally unique IDs across distributed shards, outlines the requirements for such IDs, and details Twitter's Snowflake algorithm—including its structure, generation process, and clock handling—before exploring three notable Snowflake variants and their trade‑offs.
Problem Description
In a distributed system with multiple shards, each insert must receive a globally unique identifier. Simple auto‑increment works on a single MySQL instance, but across shards the ID must satisfy several constraints:
Globally unique across all shards.
Compatible with future data migration between shards.
Contain a timestamp component so that IDs can be sorted chronologically.
Fit within 64 bits.
Support high generation rates (tens of thousands per second).
Operate without a single point of failure.
Naïve approaches such as using UUID.randomUUID() (128 bits, no timestamp), a central ID server (single‑point risk), or Flickr’s ticket server (no timestamp) do not meet all these requirements.
Twitter Snowflake
GitHub repository: https://github.com/twitter/snowflake
Snowflake generates a 63‑bit ID (the highest bit is always 0) composed of:
41 bits: millisecond‑precision timestamp.
10 bits: node identifier (5 bits datacenter ID + 5 bits worker ID).
12 bits: sequence number.
Generation process:
At worker startup, obtain a unique 10‑bit worker ID from a Zookeeper cluster.
Read the current timestamp in milliseconds.
If the timestamp equals the previous timestamp (same millisecond), increment the 12‑bit sequence; if the sequence overflows, wait until the next millisecond before issuing a new ID.
If the timestamp is greater than the previous one, initialise the sequence with a random 12‑bit value.
The system only depends on Zookeeper during startup; thereafter each worker can generate IDs independently, achieving decentralisation.
Clock rollback handling : If the current timestamp is smaller than the last used timestamp, Snowflake repeatedly reads the system clock until it advances. The official documentation recommends using NTP configured for monotonic time to avoid backward adjustments.
Snowflake Variants
Boundary Flake
Extends the ID to 128 bits: 64‑bit timestamp, 48‑bit worker identifier (MAC‑size), and 16‑bit sequence. Because the worker ID is derived locally, no Zookeeper communication is needed, providing full decentralisation.
Simpleflake
Eliminates the worker ID, keeping the 41‑bit timestamp and expanding the sequence to 22 bits. The sequence is generated randomly, which can cause collisions; therefore Simpleflake is suitable only for low‑throughput scenarios (recommended < 100 IDs/second).
Instagram Approach
Uses logical sharding to replace the worker ID. An ID consists of:
41 bits: timestamp (milliseconds).
13 bits: logical shard identifier (supports up to 8 × 1024 shards).
10 bits: sequence number derived from the table’s auto‑increment value modulo 1024 (each shard can generate up to 1024 IDs per millisecond).
The logical shard ID is determined from a designated sharding column (e.g., user ID). This design removes the need for a central coordinator, makes the ID self‑describing (it reveals the shard where the row resides), and simplifies data migration by moving whole logical shards.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
