Why Our Custom Snowflake ID Collided and How to Build a Reliable Generator
A recent production incident caused duplicate order IDs due to a flawed custom Snowflake implementation, prompting a deep dive into the standard algorithm, analysis of the mistakes, and a set of best‑practice recommendations for designing robust distributed ID generators.
1. Standard Snowflake Algorithm
The classic Snowflake ID is a 64‑bit integer composed of:
+----------------------------------------------------------------------------------------------------+
| 1 Bit | 41 Bits Timestamp | 5 Bits DataCenter ID | 5 Bits Machine ID | 12 Bits Sequence |
+----------------------------------------------------------------------------------------------------+1 sign bit (always 0) ensures the ID is positive.
41‑bit timestamp stores the millisecond offset from a custom epoch, supporting about 69 years.
10‑bit node identifier (5‑bit data‑center + 5‑bit machine) distinguishes different machines.
12‑bit sequence allows up to 4096 IDs per millisecond on the same node.
Advantages include high‑performance, time‑ordered, globally unique IDs suitable for distributed environments.
2. Our Custom Snowflake Variant and Its Failures
The in‑house two‑party package used a modified layout:
+----------------------------------------------------------------------------------------------------+
| 31 Bits TimestampDelta | 13 Bits DataCenter ID | 4 Bits Work ID | 8 Bits Business ID | 8 Bits Sequence |
+----------------------------------------------------------------------------------------------------+Key problems:
Timestamp limited to 31 bits – it wraps after only ~24.85 days, causing time‑based collisions after a few weeks.
Business ID derived from the last octet of an IP address – highly repetitive and not unique across hosts.
Work ID and DataCenter ID were left at 0 – all instances shared the same node identifier, nullifying the uniqueness guarantee.
The combination of time rollover, IP‑based conflicts, and a static node ID resulted in complete ID collisions.
3. Lessons Learned
Do not reinvent well‑tested components like Snowflake; mature libraries handle clock rollback, bit‑shifts, and coordination more reliably.
Always review third‑party code to understand its uniqueness guarantees before trusting it.
Avoid using IP suffixes for worker IDs; allocate them centrally and consistently.
Test edge cases such as long‑running processes, sequence overflow, and clock adjustments.
4. Recommended Practices
Adopt proven open‑source implementations (e.g., Hutool, Baomidou):
// Hutool example
Snowflake snowflake = IdUtil.getSnowflake(1, 1);
long id = snowflake.nextId();
// Baomidou example (supports automatic IP/MAC derivation)
DefaultIdentifierGenerator generator = new DefaultIdentifierGenerator(1, 1); // workerId=1, dataCenterId=1
long id = generator.nextId("user");For medium‑to‑large systems, use DataCenterId to represent different data‑centers or availability zones.
Worker‑ID assignment strategies:
Simple: manually configure in a properties file (suitable for development or single‑node deployments).
Standard: hash the concatenation of IP and port (or process ID) and take modulo of the worker‑ID range.
Intermediate: rely on a service registry (Eureka, Nacos) to allocate IDs during registration.
Advanced: use centralized coordinators like Redis or Zookeeper for dynamic allocation and reclamation.
Scale the solution gradually; avoid over‑engineering early on.
5. Additional Advice: Keep Business Data Separate from IDs
Embedding business semantics (type prefixes, module codes) into IDs leads to non‑numeric values, irregular length, and potential compatibility issues when business rules change. Store such metadata in separate columns and let the ID remain a pure, sortable identifier.
6. Conclusion
Re‑creating a generic component like a Snowflake ID generator is risky; rely on battle‑tested libraries, understand their internals, and configure node identifiers thoughtfully to ensure global uniqueness.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
