Mastering Distributed ID Generation: Snowflake, Custom ID Generators, and Base62 Conversion
This article explores the challenges of generating globally unique, trend‑ordered IDs in distributed systems, compares database auto‑increment, UUID and ID‑grouping approaches, explains Twitter's Snowflake algorithm, provides a full Java implementation with Base62 conversion utilities, and introduces the Vesta ID‑generator framework.
Introduction
In the previous article we discussed converting a long URL to a short one by using an ID generator that produces a unique integer and then converting it to base‑62. This article dives deeper into what an ID generator is, its principles, and how to implement one.
1. From Database Primary Key
1.1 Single‑node database
When traffic is low a single database server can satisfy the demand. The primary key is usually a BIGINT with UNSIGNED AUTO_INCREMENT. This guarantees uniqueness, monotonic increase, and a fixed step size.
However, when the system scales and we need sharding or multiple databases, this approach fails because each shard would generate overlapping IDs.
Imagine each province maintains its own database with a User table using auto‑increment IDs starting from 1. Merging all provinces into a central database would cause primary‑key collisions.
1.2 Database cluster / sharding
When a table is split across multiple machines, the auto‑increment feature can no longer guarantee global uniqueness. The following diagram shows a User table with 1 million rows distributed over two databases; each database has its own auto‑increment IDs, but there is no global ordering.
To solve this we consider several alternatives:
Use UUID – globally unique but not ordered, large, and indexes become inefficient.
ID grouping – assign each database a distinct auto_increment start value and step, preserving uniqueness but losing absolute monotonicity and requiring manual updates when adding nodes.
2. Snowflake Overview
Twitter's Snowflake algorithm generates 64‑bit IDs composed of:
1 bit sign (always 0 for positive IDs)
41 bits timestamp (millisecond precision, offset from a custom epoch, lasting ~69 years)
10 bits node identifier (5 bits data‑center, 5 bits machine, supporting up to 1024 nodes)
12 bits sequence number (up to 4096 IDs per millisecond per node)
The IDs are roughly time‑ordered, and uniqueness is ensured by the combination of timestamp, data‑center, machine, and sequence.
2.1 Snowflake Implementation
public class SnowFlake {
private static final long START_TIMESTAMP = 1480166465631L;
private static final long SEQUENCE_BIT = 12L;
private static final long MACHINE_BIT = 5L;
private static final long DATA_CENTER_BIT = 5L;
// max values, left shifts, etc.
// constructor validates dataCenterId and machineId
// nextId() generates the 64‑bit ID
// getNextMill() and getNewTimeStamp() handle clock moves
}The implementation can be adapted: you may allocate fewer bits to the data‑center if not needed, or use all 10 bits for the machine identifier.
3. Implementing a Custom ID Generator
The following class combines Snowflake ID generation with a Base‑62 conversion to produce short URLs.
public class SnowFlakeShortUrl {
// same constants as SnowFlake
// constructor takes dataCenterId and machineId
// nextId() returns a long ID
// main() demonstrates generating IDs and converting them
}Sample output (decimal → base‑62):
10进制:185894506410029056 62进制短地址:dJoJ1Xyo3C
62进制短地址:dJoJ1Xyo3C 10进制:185894506410029056
...4. Base‑62 Conversion Utility
The NumericConvertUtils class provides two static methods: toOtherNumberSystem(long number, int seed) – converts a decimal number to the specified base (up to 62). toDecimalNumber(String number, int seed) – converts a string in the given base back to decimal.
public class NumericConvertUtils {
private static final char[] digits = {
'0','1','2','3','4','5','6','7','8','9',
'a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
// conversion methods as described above
}5. Vesta Framework Introduction
Vesta is a generic ID generator (often called a unified ID allocator). It offers global uniqueness, approximate ordering, reversibility, and manufacturability. Vesta supports three deployment modes: embedded, central server, and REST. It can produce peak‑rate or fine‑granularity IDs and is designed for high performance, high availability, and scalability.
For detailed design and usage, refer to the repositories:
Gitee: https://gitee.com/robertleepeak/vesta-id-generator
GitHub: https://github.com/cloudatee/vesta-id-generator
Further articles will dive into Vesta's architecture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
