Why Our Custom Snowflake ID Generator Failed and How to Fix It
A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake algorithm; this article reviews the standard Snowflake structure, dissects the custom implementation’s critical mistakes—short timestamp, IP‑based business ID, zeroed worker and data‑center IDs—and offers best‑practice recommendations, including using mature libraries and proper worker‑ID strategies.
Background
Our online system suffered a severe incident where order numbers and serial numbers were duplicated, breaking core business workflows. The root cause was a self‑developed Snowflake‑style ID generator that malfunctioned.
Standard Snowflake Algorithm
The canonical Snowflake ID is a 64‑bit integer composed of:
+----------------------------------------------------------------------------------------------------+
| 1 Bit | 41 Bits Timestamp | 5 Bits DataCenter ID | 5 Bits Machine ID | 12 Bits Sequence |
+----------------------------------------------------------------------------------------------------+Sign bit (1) : always 0 to ensure a positive number.
Timestamp (41) : milliseconds since a fixed epoch, supporting about 69 years.
Machine ID (10) : 5 bits for data‑center and 5 bits for worker/machine.
Sequence (12) : counter within the same millisecond, up to 4096 IDs per ms.
Advantages include high‑performance, time‑ordered unique IDs suitable for distributed environments.
Problems in Our Custom Implementation
Our customized version used the following bit layout (inferred from investigation):
+----------------------------------------------------------------------------------------------------+
| 31 Bits TimestampDelta | 13 Bits DataCenter ID | 4 Bits Work ID | 8 Bits Business ID | 8 Bits Sequence |
+----------------------------------------------------------------------------------------------------+1. Insufficient Timestamp Bits
Only 31 bits are allocated for the timestamp, limiting the range to about 24.85 days.
After 2^31 milliseconds the timestamp wraps, causing IDs to repeat.
With a start epoch of 2018, the wrap already occurred many times by 2025.
2. Business ID Derived from IP Suffix
Using the last octet of the IP address (e.g., the “1” in 192.168.0.1) makes the Business ID highly collision‑prone.
3. WorkId and DataCenterId Not Configured
Both fields default to 0, meaning every instance shares the same node identifier, effectively nullifying uniqueness guarantees.
Lessons Learned
Do not reinvent mature components. Clock‑backward handling, bit manipulation, and distributed coordination are error‑prone; proven libraries are safer.
Never trust third‑party packages blindly. Always review the implementation and understand its uniqueness guarantees.
Assign worker IDs deliberately. Avoid using IP suffixes; plan and allocate WorkerId and DataCenterId centrally.
Test edge cases. Simulate long‑running operation, sequence overflow, and clock rollback to ensure robustness.
Recommended Practices
Use well‑maintained open‑source implementations such as Hutool or Baomidou:
// Hutool example
Snowflake snowflake = IdUtil.getSnowflake(1, 1);
long id = snowflake.nextId();
// Baomidou example (supports automatic IP/MAC derivation or manual settings)
DefaultIdentifierGenerator generator = new DefaultIdentifierGenerator(1, 1); // workerId=1, dataCenterId=1
long id = generator.nextId("user");For medium‑to‑large systems, DataCenterId typically identifies a data‑center or availability zone. WorkerId assignment strategies can evolve:
Simple: manually set via configuration files (suitable for development or single‑node deployments).
Standard: hash of IP + port (or process ID) modulo the total number of workers; no external dependencies.
Intermediate: allocate IDs through a service registry like Eureka or Nacos during registration.
Advanced: use centralized coordinators such as Redis or Zookeeper to dynamically assign and release WorkerIds, supporting scaling and conflict avoidance.
Gradually adopt more sophisticated mechanisms as the system grows, avoiding premature over‑engineering.
Additional Advice: Keep Business Data Separate from IDs
Embedding business semantics (type prefixes, module codes) into IDs leads to non‑numeric IDs, loss of time‑ordered sorting, irregular length, increased storage, and potential compatibility issues if business meanings change. Store business attributes separately and let the ID serve solely as a unique, sortable identifier.
Conclusion
Do not reinvent wheels for generic components; rely on battle‑tested libraries and understand their inner workings to prevent costly failures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
