An Overview of Six Distributed ID Generation Algorithms and Their Trade‑offs
This article introduces the concept of distributed IDs, outlines the required properties of global uniqueness, high availability, performance and simplicity, and compares six generation schemes—UUID/GUID, database auto‑increment, random numbers, random strings, Snowflake, and MongoDB ObjectID—detailing their structures, advantages and disadvantages.
The article begins by defining a distributed ID as a globally unique identifier required when systems scale beyond a single database, and lists the essential criteria: global uniqueness, high availability, high performance, and ease of use.
UUID/GUID are 128‑bit identifiers represented as 32 hexadecimal characters separated by hyphens (e.g., 550e8400‑e29b‑41d4‑a716‑446655440000). Different versions exist: version 1 based on time and MAC address, version 2 on user/group ID, versions 3 and 5 deterministic via hashing, and version 4 random. Advantages include easy implementation, uniqueness, no central server, and no leakage of business secrets; disadvantages are poor readability, large storage size, and potential impact on database performance.
Database Auto‑Increment leverages the auto‑increment primary key feature of relational databases (e.g., MySQL) or similar capabilities in NoSQL stores like Redis. It is simple to generate, human‑readable, and storage‑efficient, but requires a centralized service, can expose business volume if IDs are public, and incurs a round‑trip to the database for each ID.
Random Numbers can be used directly or via pseudo‑random generators such as skip32 (see https://github.com/dgryski/go-skip32) and hashids (https://hashids.org/). These methods provide high readability, small storage (as little as 4 bytes), and conceal business information, yet they still need a centralized service and often involve a two‑step generation process.
Random Strings generate short random strings (e.g., 5 bytes can encode up to a billion IDs) using techniques like MD5 hashing followed by Base62 encoding. They are compact, readable, and random, but may produce collisions, requiring additional uniqueness checks.
Snowflake‑style Algorithms (originating from Twitter) allocate a 64‑bit ID: 1 unused bit, 41 bits for a millisecond‑precision timestamp (≈69 years), 10 bits for a machine identifier (configurable as data‑center + worker IDs), and 12 bits for a per‑millisecond sequence (up to 4096 IDs). Benefits are low storage (8 bytes), good readability, and high performance with decentralized generation; drawbacks include possible duplicate IDs on clock rollback and the deterministic nature of IDs leaking information.
MongoDB ObjectID uses a 12‑byte structure: 4‑byte timestamp, 3‑byte machine ID, 2‑byte process ID, and 3‑byte incrementing counter. It offers high readability and good performance with decentralized generation, but consumes more storage than Snowflake, suffers from duplicate IDs on clock rollback, and its predictable pattern can expose information.
The author mentions future plans to write a multi‑part series on Redis topics (master‑slave sync, Sentinel, Codis, and Cluster) and signs off with a brief personal note.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
