Backend Development 10 min read

How Snowflake Generates Globally Unique IDs: Deep Dive and Java Implementation

This article explains Twitter's Snowflake algorithm, detailing its 64‑bit structure, the role of each bit segment, a step‑by‑step Java implementation, and the algorithm's performance advantages and practical limitations in distributed backend systems.

Code Ape Tech Column

Feb 1, 2021

How Snowflake Generates Globally Unique IDs: Deep Dive and Java Implementation

Overview of Snowflake

Snowflake is a distributed ID generation algorithm originally open‑sourced by Twitter. It creates a 64‑bit long integer that is globally unique across a distributed system by embedding a timestamp, datacenter ID, machine ID, and a per‑millisecond sequence number.

Bit Allocation

The 64 bits are divided as follows:

1 unused bit (always 0) to keep the ID positive.

41 bits for the timestamp in milliseconds, allowing roughly 69 years of unique timestamps.

5 bits for the datacenter (or “machine room”) ID, supporting up to 32 datacenters.

5 bits for the machine ID within a datacenter, supporting up to 32 machines per datacenter.

12 bits for a sequence number, enabling up to 4096 IDs to be generated within the same millisecond on a single machine.

When combined, these fields produce a monotonically increasing ID that can be sorted by creation time.

Java Implementation

The following Java class implements the Snowflake algorithm. It validates the datacenter and machine IDs, handles clock rollback, and ensures uniqueness even when many IDs are generated in the same millisecond.

public class IdWorker {
    // 1 unused bit, always 0
    private long workerId;          // 5 bits
    private long datacenterId;      // 5 bits
    private long sequence;         // 12 bits
    private long twepoch = 1585644268888L; // custom epoch
    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;
    private long sequenceBits = 12L;
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);
    private long lastTimestamp = -1L;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }

    public synchronized long nextId() {
        long timestamp = timeGen();
        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards. Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }
        lastTimestamp = timestamp;
        return ((timestamp - twepoch) << timestampLeftShift) |
               (datacenterId << datacenterIdShift) |
               (workerId << workerIdShift) | sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
        return System.currentTimeMillis();
    }

    public static void main(String[] args) {
        // Example usage:
        // IdWorker worker = new IdWorker(1, 1, 0);
        // System.out.println(worker.nextId());
    }
}

Advantages

High performance and availability: IDs are generated entirely in memory without database calls.

Large capacity: Up to millions of IDs can be produced per second.

Monotonic and sortable: IDs increase over time, which improves indexing in databases.

Limitations

The algorithm depends on synchronized system clocks; if a server’s clock moves backward, duplicate IDs may occur. In practice, the number of datacenters and machines is often far less than the theoretical limits, so the bit allocation can be adjusted to better fit specific business needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java algorithm scalability Snowflake Distributed ID Unique ID

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.