Fundamentals 12 min read

Distributed ID Generation Schemes and the rpcxio/did Service

This article reviews various ID generation methods—including UUID/GUID, auto‑increment integers, random numbers, Snowflake, and MongoDB ObjectID—explains their advantages and drawbacks, and introduces the rpcxio/did distributed ID service with performance benchmarks and deployment considerations.

Architect
Architect
Architect
Distributed ID Generation Schemes and the rpcxio/did Service

Identifiers (IDs) are essential in both real‑world and computer systems for uniquely locating entities, and a well‑designed ID scheme balances uniqueness, storage cost, performance, and security.

UUID/GUID are 128‑bit identifiers (16 bytes) defined by the OSF and represented as 32 hexadecimal characters in the 8‑4‑4‑4‑12 format, e.g., 550e8400-e29b-41d4-a716-446655440000 . They can be generated without a central authority, have near‑zero collision probability, and are widely supported, but they consume more storage and can degrade database performance.

Auto‑increment integers are simple, readable, and compact (often 4 bytes) and are supported by relational databases (MySQL) and some NoSQL stores (Redis). Their main drawbacks are the need for a centralized service, single‑point bottlenecks, and potential information leakage when IDs are exposed.

Random numbers and random strings provide better privacy. Techniques include skip32‑based pseudo‑random increments, hashids (e.g., converting 347 to yr8 ), and short base‑62 or base‑58 strings used by URL shorteners. They improve readability and conceal information but may require uniqueness checks.

Twitter Snowflake is a popular 64‑bit distributed ID algorithm using 41 bits for a millisecond timestamp, 10 bits for a worker (or data‑center + machine) ID, and 12 bits for a per‑millisecond sequence, yielding up to 4096 IDs per millisecond per node. It offers small storage (8 bytes), high performance, and roughly sortable IDs, but time‑rollback can cause collisions and the IDs reveal generation patterns.

MongoDB ObjectID uses a 12‑byte structure (4‑byte timestamp, 3‑byte machine ID, 2‑byte process ID, 3‑byte increment). It is readable and can be generated centrally or independently, yet it consumes more storage than Snowflake and shares similar rollback risks.

The article then introduces the rpcxio/did distributed ID generator, which builds on Snowflake but allows customizable bit allocations for worker IDs and sequences. It operates as a centralized service with batch‑fetch capabilities to reduce network overhead. Performance tests show a single node can generate 120 k IDs/second for single requests and up to 2.97 M IDs/second when fetching batches of 100 IDs.

1、256个client并发,每次只获取1个ID, ID的产生速度是 12万个ID/秒。
./bclient -addr 192.168.15.225:8972 -n 100000

total IDs: 25600000, duration: 3m31.581592489s, id/s: 120993

2、如果采用批量获取,尽量减少网络消耗,256个client并发,每次只获取100个ID, ID的产生速度是 297万个ID/秒。
./bclient -addr 192.168.15.225:8972 -n 1000000 -b 100

total IDs: 256000000, duration: 1m26.178942509s, id/s: 2970563

Because the service can be clustered, the failure of a few nodes does not affect overall ID generation, and the low latency of Snowflake‑based IDs means only a few machines per data center are needed.

References to further reading are provided at the end of the article.

Distributed SystemsUUIDsnowflakeID generationunique identifiers
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.