Global Unique ID Overview and Generation Strategies
This article explains the concept, essential characteristics, and common generation strategies—including database auto‑increment, UUID, Redis, Zookeeper, and Twitter's Snowflake—highlighting their advantages, drawbacks, and practical optimization tips for building reliable distributed systems.
Global Unique ID Overview
In distributed systems a globally unique identifier (ID) is essential for uniquely marking data or messages, especially when sharding or partitioning databases; various generation strategies exist, each with specific use‑cases, benefits, and limitations.
Characteristics of Global Unique IDs
Global Uniqueness : No duplicate IDs are allowed.
Trend Increment : IDs should be roughly increasing to suit B‑tree indexes.
Monotonic Increment : Each subsequent ID is larger than the previous.
Security : Random or non‑sequential IDs can mitigate enumeration attacks.
High Availability : ID generation must avoid single points of failure.
Shard Support : IDs can embed shard information for efficient queries.
Reasonable Length : IDs should not be excessively long.
Common Global Unique ID Generation Strategies
1. Database Auto‑Increment Sequence or Field
Uses the database's native auto‑increment feature to guarantee uniqueness within the whole database.
Advantages : Simple, low cost, leverages existing DB functionality, provides monotonic IDs useful for pagination and sorting.
Disadvantages :
Strong dependency on a specific DB; migration or multi‑DB scenarios become complex.
Single‑point failure if the primary DB is unavailable.
Consistency challenges with master‑slave replication.
Scalability limited by the write capacity of a single DB instance.
Partial Optimization : Deploy multiple master databases with distinct start values and identical step sizes (e.g., Master1 generates 1,4,7,10; Master2 generates 2,5,8,11; Master3 generates 3,6,9,12) to distribute load and maintain uniqueness.
2. UUID
Universally Unique Identifier generated locally by the application.
Standard UUID format consists of 32 hexadecimal digits displayed as 8‑4‑4‑4‑12 (36 characters with hyphens), e.g., 550e8400-e29b-41d4-a716-446655440000 .
Java example:
UUID uuid = UUID.randomUUID();
String s = UUID.randomUUID().toString();Advantages :
Simple local generation, no network overhead.
Globally unique, easing data migration and merging.
Disadvantages :
High storage cost (16 bytes, usually stored as 36‑character string).
Potential information leakage (e.g., MAC address exposure).
Unsuitable as primary keys in many databases due to size and lack of order.
Unordered nature harms B‑tree index performance.
Large transmission size and poor readability.
Partial Optimizations :
Convert UUID to a 64‑bit integer for compact storage.
Use COMB algorithm (combined GUID/timestamp) to add ordering information.
3. Redis‑Based ID Generation
Leverages Redis's atomic INCR/INCRBY commands; suitable when database performance is insufficient.
In a Redis cluster, each node can be assigned a distinct start value and step (e.g., five nodes with start values 1‑5 and step 5) producing sequences like 1,6,11,…, 2,7,12,… etc., thus avoiding single‑point failure and providing ordered numeric IDs.
Advantages :
Independent of databases, high performance.
Numeric IDs are naturally ordered, aiding pagination.
Disadvantages :
Introduces a new component, increasing system complexity.
Requires additional coding and configuration.
Redis itself can become a single point of failure.
4. Zookeeper‑Based ID Generation
Uses Zookeeper znode version numbers to produce 32‑ or 64‑bit sequence values; however, it involves multi‑step API calls and may need distributed locks, making it less suitable for high‑concurrency scenarios.
5. Twitter Snowflake Algorithm
Snowflake generates 64‑bit IDs composed of a 41‑bit timestamp, a 10‑bit machine identifier (5 bits data‑center + 5 bits worker), and a 12‑bit sequence within the same millisecond, plus a sign bit set to 0.
Advantages include high stability, no reliance on external services, configurable bit allocation, and monotonic increase per node. Drawbacks are dependence on accurate system clocks (clock rollback can cause duplicates) and the fact that IDs are only locally monotonic, not globally ordered across nodes.
Overall, selecting an ID generation strategy requires balancing uniqueness, performance, scalability, availability, and security requirements of the target application.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.