Backend Development 19 min read

Why MyBatis‑Plus ID Collisions Occur and How Seata’s Optimized Snowflake Solves Them

This article explains the primary‑key duplication issue caused by MyBatis‑Plus in clustered Docker/K8S environments, analyzes the limitations of the standard Snowflake algorithm, and presents Seata’s improved Snowflake implementation that provides globally unique, high‑performance IDs while minimizing database page splits.

macrozheng
macrozheng
macrozheng
Why MyBatis‑Plus ID Collisions Occur and How Seata’s Optimized Snowflake Solves Them

Yesterday a teammate using MyBatis‑Plus in a clustered K8S environment encountered primary‑key duplication errors. MyBatis‑Plus initializes

workerId

and

dataCenterId

via

com.baomidou.mybatisplus.core.toolkit.Sequence.getMaxWorkerId()

and

getDatacenterId()

. The

workerId

is derived from the JVM name, while

dataCenterId

is derived from the MAC address, which can easily repeat when deployed in Docker containers.

<code>protected long getMaxWorkerId(long datacenterId, long maxWorkerId) {
    StringBuilder mpid = new StringBuilder();
    mpid.append(datacenterId);
    String name = ManagementFactory.getRuntimeMXBean().getName();
    if (StringUtils.isNotBlank(name)) {
        mpid.append(name.split("@")[0]);
    }
    return (long)(mpid.toString().hashCode() & '\uffff') % (maxWorkerId + 1L);
}

protected long getDatacenterId(long maxDatacenterId) {
    // ... omitted ...
    byte[] mac = network.getHardwareAddress();
    if (null != mac) {
        long id = (255L & (long)mac[mac.length - 2] | 65280L & (long)mac[mac.length - 1] << 8) >> 6;
        id %= maxDatacenterId + 1L;
        return id;
    }
    return 0L;
}
</code>

Because both identifiers can collide in containerized deployments, the article recommends replacing MyBatis‑Plus’s ID generation with an optimized Snowflake algorithm provided by Seata.

Overview

In software development, globally unique and incrementally ordered identifiers are often required for distributed primary keys, especially when using MySQL. Incremental IDs help reduce InnoDB page splits, lower I/O pressure, and improve server performance.

The standard Snowflake algorithm generates IDs based on the current system timestamp, which makes it sensitive to clock rollback. If the OS clock moves backward, duplicate IDs may be produced.

Seata’s Optimized Snowflake Scheme

Seata modifies the original Snowflake format by swapping the positions of the node ID and timestamp, weakening the tight coupling with the OS clock. The generator captures the timestamp once at startup and then relies on a sequence counter for monotonic growth.

The core implementation stores the timestamp and sequence together in a single

AtomicLong

:

<code>/**
 * timestamp and sequence mix in one Long
 * highest 11 bit: not used
 * middle 41 bit: timestamp
 * lowest 12 bit: sequence
 */
private AtomicLong timestampAndSequence;

private final int sequenceBits = 12;

private void initTimestampAndSequence() {
    long timestamp = getNewestTimestamp();
    long timestampWithSequence = timestamp << sequenceBits;
    this.timestampAndSequence = new AtomicLong(timestampWithSequence);
}
</code>

Worker IDs are generated from the MAC address, ensuring a maximum of 1024 distinct nodes:

<code>private long generateWorkerIdBaseOnMac() throws Exception {
    Enumeration<NetworkInterface> all = NetworkInterface.getNetworkInterfaces();
    while (all.hasMoreElements()) {
        NetworkInterface networkInterface = all.nextElement();
        if (networkInterface.isLoopback() || networkInterface.isVirtual()) {
            continue;
        }
        byte[] mac = networkInterface.getHardwareAddress();
        return ((mac[4] & 0B11) << 8) | (mac[5] & 0xFF);
    }
    throw new RuntimeException("no available mac found");
}
</code>

The final ID generation combines the pre‑shifted worker ID with the timestamp‑sequence value:

<code>private final int timestampBits = 41;
private final int sequenceBits = 12;
private final long timestampAndSequenceMask = ~(-1L << (timestampBits + sequenceBits));

public long nextId() {
    long next = timestampAndSequence.incrementAndGet();
    long timestampWithSequence = next & timestampAndSequenceMask;
    return workerId | timestampWithSequence;
}
</code>

Defects of the Improved Scheme

While the algorithm guarantees monotonic IDs within a single node, it does not ensure global monotonicity across multiple nodes because the node ID occupies the high bits. Consequently, IDs generated by a node with a larger ID are always greater than those from a node with a smaller ID, regardless of generation time.

B+‑Tree Principle

In InnoDB, primary‑key indexes are stored in a B+‑tree. Page splits occur when a leaf page becomes full, leading to additional I/O. Incremental IDs (e.g.,

auto_increment

) minimize page splits because new records are always appended to the tail of the leaf list.

Random IDs (e.g., UUID) cause frequent page splits because inserts are distributed across pages.

Impact of Seata’s Scheme on Page Splits

Although IDs are globally unordered, each node produces a strictly increasing sub‑sequence. After a few initial page splits, the system reaches a stable state where subsequent inserts for a given node are appended to the tail of its own sub‑sequence, avoiding further splits.

Thus, the algorithm provides high performance, global uniqueness, and, after an initial stabilization period, does not cause frequent page splits.

Conclusion

The improved Snowflake algorithm does not guarantee global monotonicity but maintains monotonicity per node and quickly reaches a stable state that prevents frequent B+‑tree page splits. It is suitable for long‑lived tables where IDs are rarely deleted; frequent deletions could trigger page merges that interfere with the stabilization process.

backendMySQLsnowflakedistributed IDseata
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.