How a Snowflake ID Overflow Crashed Our System and What We Learned

A production outage on 2024‑11‑20 was traced to a Snowflake‑based UID generator whose timestamp bits ran out, prompting a detailed post‑mortem that explains the root cause, bit‑allocation analysis, and the steps taken to fix and prevent future ID overflow issues.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How a Snowflake ID Overflow Crashed Our System and What We Learned
Keywords: distributed ID, Snowflake algorithm, Uidgenerator, online issue

Hello, I am Su San.

A recent online incident was caused by a seemingly tiny Snowflake algorithm implementation. This post records the incident, reflects on the work, and reminds readers to check for similar problems in their own projects.

Accident Scene

The incident occurred on 2024‑11‑20 at 09:40. Operations detected an alert, and multiple business groups reported system anomalies.

Incident detection screenshot
Incident detection screenshot

Urgent log tracing revealed the following error:

Error log screenshot
Error log screenshot

Exception details:

com.xxx.uid.exception.UidGenerateException: Timestamp bits is exhausted. Refusing UID generate. Now: 1732112168

Problem Investigation

Developers quickly arrived at the scene. The project uses a custom 19‑digit UID generated by Baidu's UidGenerator based on the Snowflake algorithm.

The relevant code that threw the exception:

/**
 * Get current second
 */
private long getCurrentSecond() {
    long currentSecond = TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis());
    if (currentSecond - epochSeconds > bitsAllocator.getMaxDeltaSeconds()) {
        throw new UidGenerateException("Timestamp bits is exhausted. Refusing UID generate. Now: " + currentSecond);
    }
    return currentSecond;
}

The exception means the timestamp portion of the UID exceeded its maximum value.

Tracing the code showed that the timestamp bits were configured too short.

Timestamp bits configuration
Timestamp bits configuration

Root Cause Analysis

UidGenerator principle

Reference: https://github.com/baidu/uid-generator/blob/master/README.zh_cn.md

The Snowflake‑based UID consists of four parts:

sign (1 bit) : fixed sign, ensures UID is positive.

delta seconds (28 bits) : seconds elapsed since the base date 2016‑05‑20, supporting about 8.7 years.

worker id (22 bits) : machine identifier, supporting roughly 4.2 million node starts.

sequence (13 bits) : per‑second sequence, supporting up to 8192 IDs per second.

In our project the delta seconds field uses 28 bits, which can represent timestamps up to around 2024‑11‑20, exactly the date of the outage.

To extend the usable period, we can increase the timestamp bits, e.g., to 31 bits.

Reasonableness Analysis

Proposed new bit allocation:

timeBits = 31 : supports up to 2,147,483,647 seconds (~68.5 years), covering until about 2084.

workerBits = 15 : supports 32,768 nodes, sufficient for most systems.

seqBits = 17 : supports up to 131,071 IDs per second, suitable for high‑concurrency scenarios.

Key considerations:

Ensure correct bit‑wise operations for timestamp, worker ID, and sequence.

Maintain time synchronization across nodes.

Plan for future scalability by allowing flexible bit adjustments.

Implementation Measures

Adjust timestamp bits and redeploy the program.

/** Bits allocate */
protected int timeBits = 31; // 28 -> 31
protected int workerBits = 15; // 22 -> 15
protected int seqBits = 17; // 13 -> 17

The new bit configuration is illustrated below:

New bit allocation diagram
New bit allocation diagram

Roll out updates in batches, prioritizing core services.

Maintain a release list to track affected services.

Verify versions to ensure production correctness.

Monitor basic flows and confirm the issue is resolved.

Report the incident.

Post‑mortem Summary

No Snowflake algorithm is ever completely innocent during an outage.

Developers must understand core framework principles to locate problems quickly.

Reasonable analysis before emergency measures prevents new or lingering issues.

Clear global references, including code, databases, and business logic.

Learn how historical issues are discovered.

Timely detection of similar problems in other projects is essential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentDistributed IDSnowflake algorithmUID generatorProduction Incident
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.