Why Our Custom Snowflake ID Failed and How to Build a Reliable One

A recent production incident revealed that a self‑developed Snowflake‑style ID generator caused duplicate order numbers due to a truncated timestamp, unsafe IP‑based business IDs, and unconfigured worker and data‑center IDs, prompting a detailed analysis of the standard algorithm, the flaws in the custom design, and best‑practice recommendations for robust ID generation.

Java Architect Handbook
Java Architect Handbook
Java Architect Handbook
Why Our Custom Snowflake ID Failed and How to Build a Reliable One

Standard Snowflake Algorithm (Snowflake)

The canonical Snowflake ID is a 64‑bit long composed of:

+---------------------------------------------------------------+
| 1 Bit | 41 Bits Timestamp | 5 Bits DataCenter ID | 5 Bits Machine ID | 12 Bits Sequence |
+---------------------------------------------------------------+

1 Bit Sign: Always 0 to ensure a positive number.

41 Bits Timestamp: Millisecond offset from a fixed epoch, supporting about 69 years.

5 Bits DataCenter ID: Distinguishes different data centers.

5 Bits Machine ID: Distinguishes machines within a data center.

12 Bits Sequence: Allows up to 4096 IDs per millisecond on the same node.

Advantages include high‑performance, time‑ordered uniqueness suitable for distributed environments.

Our Custom "Snowflake" Implementation and Its Problems

The in‑house package we used had the following layout (inferred from investigation):

+---------------------------------------------------------------+
| 31 Bits TimestampDelta | 13 Bits DataCenter ID | 4 Bits Work ID | 8 Bits Business ID | 8 Bits Sequence |
+---------------------------------------------------------------+

Although it appears richer, three critical issues made it unreliable:

1. Timestamp Only 31 Bits – Supports Just 24.85 Days

After left‑shifting 33 bits, only 31 bits of the timestamp remain.

The counter wraps after 2³¹ milliseconds, causing time‑based collisions.

With a custom epoch of 2018, the wrap already occurred many times by 2025.

2. Business ID Uses the IP’s Last Octet

Using the final segment of an IPv4 address (e.g., "1" from 192.168.0.1) makes the identifier highly prone to duplication across hosts.

3. Work ID and DataCenter ID Are Fixed at Zero

All instances share the same node identifiers, effectively nullifying the uniqueness guarantees of the algorithm.

Lessons Learned

Avoid reinventing common components: Snowflake involves clock‑backward handling, bit manipulation, and distributed coordination; mature libraries are far safer.

Never trust a third‑party package blindly: Review the implementation and understand its uniqueness guarantees.

Configure worker and data‑center IDs properly: Do not rely on fragile IP suffixes; allocate IDs centrally or via a deterministic scheme.

Test edge cases early: Simulate long‑running operation, sequence overflow, and clock rollback to ensure robustness.

Recommended Practices for Reliable ID Generation

Prefer proven open‑source solutions such as Hutool or Baomidou:

// Hutool example
Snowflake snowflake = IdUtil.getSnowflake(1, 1);
long id = snowflake.nextId();

// Baomidou example (supports automatic IP/MAC derivation or manual config)
DefaultIdentifierGenerator generator = new DefaultIdentifierGenerator(1, 1); // workerId=1, dataCenterId=1
long id = generator.nextId("user");

For medium‑to‑large systems, treat the DataCenter ID as the identifier of a data center or availability zone.

Worker ID assignment strategies can evolve with system scale:

Simple: Manually set in a configuration file – suitable for development or single‑node deployments.

Standard: Hash the concatenation of IP and port (or process ID) and take the modulus of the allowed worker‑ID range – works for small to medium clusters without external services.

Intermediate: Use a service registry (e.g., Eureka, Nacos) to allocate IDs at registration time, coupling the service ID with uniqueness.

Advanced: Leverage centralized coordinators like Redis or Zookeeper to dynamically assign and recycle worker IDs, supporting elastic scaling and conflict avoidance.

Gradually adopt more sophisticated mechanisms as the system grows, avoiding premature over‑engineering.

Other Advice: Keep Business Information Separate from IDs

Embedding business semantics (type prefixes, module codes, etc.) into the identifier leads to non‑numeric IDs, breaks time‑ordered sorting, inflates storage, and creates compatibility issues when business meanings evolve. Store such metadata in separate columns and let the ID remain a pure, unique, sortable key.

backenddistributed-systemsbest practicesSnowflakeID generationuniqueness
Java Architect Handbook
Written by

Java Architect Handbook

Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.