Why Our Custom Snowflake ID Generator Failed and How to Build a Reliable One
A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake implementation; the article reviews the standard Snowflake structure, pinpoints design mistakes such as a 31‑bit timestamp, IP‑based business IDs, and zeroed worker IDs, and offers concrete recommendations and proven library alternatives for safe distributed ID generation.
1. Standard Snowflake Algorithm
The classic Snowflake ID is a 64‑bit long composed of several fields:
+----------------------------------------------------------------------------------------------------+
| 1 Bit | 41 Bits Timestamp | 5 Bits DataCenter ID | 5 Bits Machine ID | 12 Bits Sequence |
+----------------------------------------------------------------------------------------------------+Sign bit : always 0 to ensure positive numbers.
41‑bit timestamp : milliseconds since a custom epoch, supporting about 69 years.
5‑bit data‑center ID : distinguishes different data centers.
5‑bit machine ID : distinguishes nodes within a data center.
12‑bit sequence : allows up to 4096 IDs per millisecond on the same node.
2. Our Custom Snowflake Implementation and Issues
Our in‑house package used a different layout:
+----------------------------------------------------------------------------------------------------+
| 31 Bits TimestampDelta | 13 Bits DataCenter ID | 4 Bits Work ID | 8 Bits Business ID | 8 Bits Sequence |
+----------------------------------------------------------------------------------------------------+Problems
Timestamp only 31 bits : supports roughly 24.85 days; after 2^31 ms the timestamp wraps, causing repeated IDs.
BusinessId derived from the last octet of an IP address : extremely prone to collisions across machines.
WorkId and DataCenterId left at 0 : all instances share the same node identifier, nullifying uniqueness.
These flaws resulted in time rollover, IP‑based conflicts, and sequence collisions, ultimately producing duplicate IDs.
3. Lessons Learned
Avoid reinventing well‑known components; mature libraries are far more reliable.
Never trust a third‑party package blindly; always verify its uniqueness guarantees.
Configure worker and data‑center IDs properly instead of using fragile IP suffixes.
Test edge cases such as long‑running operation, sequence overflow, and clock rollback.
Do not embed business semantics into the ID; keep the ID purely for uniqueness and ordering.
4. Recommended Practices
Adopt proven open‑source implementations, for example Hutool or Baomidou:
// Hutool example
Snowflake snowflake = IdUtil.getSnowflake(1, 1);
long id = snowflake.nextId();
// Baomidou example (supports automatic IP/MAC derivation)
DefaultIdentifierGenerator generator = new DefaultIdentifierGenerator(1, 1);
long id = generator.nextId("user");For medium‑to‑large systems, DataCenterId typically identifies a data‑center or availability zone, while WorkerId can be assigned using several strategies:
Simple : manually set in configuration (suitable for development or single‑node deployments).
Standard : hash of IP + port (or process ID) modulo the total number of workers; works without external services.
Intermediate : use a service registry such as Eureka or Nacos to allocate IDs during registration.
Advanced : employ a centralized coordinator like Redis or Zookeeper for dynamic allocation, release, and conflict avoidance.
Gradually adopt more sophisticated mechanisms as the system scales to avoid over‑design early on.
5. Additional Advice
Keep business fields separate from the ID. Embedding them makes the ID non‑numeric, breaks time‑ordered sorting, inflates storage, and creates compatibility problems when business meanings change.
Conclusion
Do not reinvent the wheel for generic components; rely on battle‑tested solutions to prevent critical failures in production.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
