Why MyBatis‑Plus ID Collisions Occur and How Seata’s Optimized Snowflake Solves Them
This article explains the primary‑key duplication issue caused by MyBatis‑Plus in clustered Docker/K8S environments, analyzes the limitations of the standard Snowflake algorithm, and presents Seata’s improved Snowflake implementation that provides globally unique, high‑performance IDs while minimizing database page splits.
Yesterday a teammate using MyBatis‑Plus in a clustered K8S environment encountered primary‑key duplication errors. MyBatis‑Plus initializes
workerIdand
dataCenterIdvia
com.baomidou.mybatisplus.core.toolkit.Sequence.getMaxWorkerId()and
getDatacenterId(). The
workerIdis derived from the JVM name, while
dataCenterIdis derived from the MAC address, which can easily repeat when deployed in Docker containers.
<code>protected long getMaxWorkerId(long datacenterId, long maxWorkerId) {
StringBuilder mpid = new StringBuilder();
mpid.append(datacenterId);
String name = ManagementFactory.getRuntimeMXBean().getName();
if (StringUtils.isNotBlank(name)) {
mpid.append(name.split("@")[0]);
}
return (long)(mpid.toString().hashCode() & '\uffff') % (maxWorkerId + 1L);
}
protected long getDatacenterId(long maxDatacenterId) {
// ... omitted ...
byte[] mac = network.getHardwareAddress();
if (null != mac) {
long id = (255L & (long)mac[mac.length - 2] | 65280L & (long)mac[mac.length - 1] << 8) >> 6;
id %= maxDatacenterId + 1L;
return id;
}
return 0L;
}
</code>Because both identifiers can collide in containerized deployments, the article recommends replacing MyBatis‑Plus’s ID generation with an optimized Snowflake algorithm provided by Seata.
Overview
In software development, globally unique and incrementally ordered identifiers are often required for distributed primary keys, especially when using MySQL. Incremental IDs help reduce InnoDB page splits, lower I/O pressure, and improve server performance.
The standard Snowflake algorithm generates IDs based on the current system timestamp, which makes it sensitive to clock rollback. If the OS clock moves backward, duplicate IDs may be produced.
Seata’s Optimized Snowflake Scheme
Seata modifies the original Snowflake format by swapping the positions of the node ID and timestamp, weakening the tight coupling with the OS clock. The generator captures the timestamp once at startup and then relies on a sequence counter for monotonic growth.
The core implementation stores the timestamp and sequence together in a single
AtomicLong:
<code>/**
* timestamp and sequence mix in one Long
* highest 11 bit: not used
* middle 41 bit: timestamp
* lowest 12 bit: sequence
*/
private AtomicLong timestampAndSequence;
private final int sequenceBits = 12;
private void initTimestampAndSequence() {
long timestamp = getNewestTimestamp();
long timestampWithSequence = timestamp << sequenceBits;
this.timestampAndSequence = new AtomicLong(timestampWithSequence);
}
</code>Worker IDs are generated from the MAC address, ensuring a maximum of 1024 distinct nodes:
<code>private long generateWorkerIdBaseOnMac() throws Exception {
Enumeration<NetworkInterface> all = NetworkInterface.getNetworkInterfaces();
while (all.hasMoreElements()) {
NetworkInterface networkInterface = all.nextElement();
if (networkInterface.isLoopback() || networkInterface.isVirtual()) {
continue;
}
byte[] mac = networkInterface.getHardwareAddress();
return ((mac[4] & 0B11) << 8) | (mac[5] & 0xFF);
}
throw new RuntimeException("no available mac found");
}
</code>The final ID generation combines the pre‑shifted worker ID with the timestamp‑sequence value:
<code>private final int timestampBits = 41;
private final int sequenceBits = 12;
private final long timestampAndSequenceMask = ~(-1L << (timestampBits + sequenceBits));
public long nextId() {
long next = timestampAndSequence.incrementAndGet();
long timestampWithSequence = next & timestampAndSequenceMask;
return workerId | timestampWithSequence;
}
</code>Defects of the Improved Scheme
While the algorithm guarantees monotonic IDs within a single node, it does not ensure global monotonicity across multiple nodes because the node ID occupies the high bits. Consequently, IDs generated by a node with a larger ID are always greater than those from a node with a smaller ID, regardless of generation time.
B+‑Tree Principle
In InnoDB, primary‑key indexes are stored in a B+‑tree. Page splits occur when a leaf page becomes full, leading to additional I/O. Incremental IDs (e.g.,
auto_increment) minimize page splits because new records are always appended to the tail of the leaf list.
Random IDs (e.g., UUID) cause frequent page splits because inserts are distributed across pages.
Impact of Seata’s Scheme on Page Splits
Although IDs are globally unordered, each node produces a strictly increasing sub‑sequence. After a few initial page splits, the system reaches a stable state where subsequent inserts for a given node are appended to the tail of its own sub‑sequence, avoiding further splits.
Thus, the algorithm provides high performance, global uniqueness, and, after an initial stabilization period, does not cause frequent page splits.
Conclusion
The improved Snowflake algorithm does not guarantee global monotonicity but maintains monotonicity per node and quickly reaches a stable state that prevents frequent B+‑tree page splits. It is suitable for long‑lived tables where IDs are rarely deleted; frequent deletions could trigger page merges that interfere with the stabilization process.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.