Why Our Custom Snowflake ID Generator Failed and How to Build a Reliable One
A recent production incident revealed duplicate order IDs caused by a home‑grown Snowflake implementation that misused timestamps, IP‑based business IDs, and unconfigured worker/data‑center IDs, leading to collisions; the article analyzes the flaws, shares lessons, and recommends proven libraries and proper ID‑generation strategies.
Standard Snowflake Algorithm
Standard Snowflake ID is a 64‑bit integer composed of:
+---------------------------------------------------------------+
| 1 Bit | 41 Bits Timestamp | 5 Bits DataCenter ID | 5 Bits Machine ID | 12 Bits Sequence |
+---------------------------------------------------------------+1 bit sign : always 0 to keep the number positive.
41‑bit timestamp : milliseconds since a custom epoch, supports ~69 years.
5‑bit data‑center ID and 5‑bit machine ID distinguish nodes.
12‑bit sequence : up to 4096 IDs per millisecond per node.
Issues in a Custom Implementation
A modified layout used in an in‑house package:
+---------------------------------------------------------------+
| 31 Bits TimestampDelta | 13 Bits DataCenter ID | 4 Bits Work ID | 8 Bits Business ID | 8 Bits Sequence |
+---------------------------------------------------------------+Timestamp limited to 31 bits : after left‑shifting 33 bits only 31 bits remain, causing the counter to wrap after 2³¹ ms (~24.85 days). With an epoch starting in 2018 the timestamp has already wrapped multiple times.
Business ID derived from the last octet of the IP address : this scheme is highly collision‑prone.
Work ID and DataCenter ID not configured (all zero) : every instance shares the same node identifier, eliminating uniqueness.
Combined, these defects lead to complete ID duplication.
Lessons Learned
Avoid reinventing well‑tested components; use proven Snowflake libraries.
Review third‑party implementations to verify uniqueness guarantees.
Assign machine IDs deliberately; avoid fragile schemes such as IP suffixes.
Test edge cases: long‑running processes, sequence overflow, and clock rollback.
Recommended Practices
Use mature open‑source implementations, for example Hutool or Baomidou:
// Hutool example
Snowflake snowflake = IdUtil.getSnowflake(1, 1);
long id = snowflake.nextId();
// Baomidou example (supports automatic derivation from IP/MAC or manual configuration)
DefaultIdentifierGenerator generator = new DefaultIdentifierGenerator(1, 1); // workerId=1, dataCenterId=1
long id = generator.nextId("user");Worker‑ID assignment strategies can evolve with system size:
Simple : manually set IDs in configuration files (suitable for development or single‑node deployments).
Standard : hash the combination of IP and port (or process ID) and take the modulus of the total worker count.
Intermediate : allocate IDs via a service registry such as Eureka or Nacos during service registration.
Advanced : use a centralized coordination service like Redis or Zookeeper to allocate and release Worker IDs dynamically, supporting scaling and conflict avoidance.
Do Not Embed Business Flags in IDs
Appending business information (e.g., type prefixes or module codes) makes IDs non‑numeric, breaks time‑ordered sorting, increases storage size, and can cause compatibility problems when business semantics change. Store business fields separately and keep IDs purely for uniqueness and ordering.
Conclusion
Rely on battle‑tested Snowflake libraries and understand their uniqueness guarantees to avoid costly ID collisions in production systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
