Resolving ID Conflicts After Sharding: 5 Practical Solutions
When sharding databases, independent auto‑increment IDs can collide; this article explains why the conflict occurs and evaluates five concrete solutions—UUID, MySQL auto‑increment step, segment allocation, Snowflake algorithm, and Meituan Leaf—detailing their mechanisms, trade‑offs, performance, and suitability.
In a single‑database, single‑table setup, using AUTO_INCREMENT guarantees unique primary keys. After splitting a table into multiple shards across different databases, each shard maintains its own auto‑increment sequence, so identical IDs (e.g., 1000) can appear in different shards, causing ambiguity during cross‑shard queries or data migration.
Solution 1: UUID
Generate a 128‑bit UUID as the primary key:
String id = UUID.randomUUID().toString(); // e.g., 550e8400-e29b-41d4-a716-446655440000Pros: Simple, globally unique without coordination. Cons: Stores as a 36‑character VARCHAR(32) (or larger) which consumes more space than BIGINT; random order leads to B‑tree page splits in InnoDB, degrading write performance; poor readability.
Solution 2: Database Auto‑Increment Step
Configure each MySQL instance with a distinct auto_increment_offset and a common auto_increment_increment equal to the number of shards. Example for two shards:
DB 1: offset 1, increment 2 → IDs 1, 3, 5, …
DB 2: offset 2, increment 2 → IDs 2, 4, 6, …
-- DB 1
SET @auto_increment_offset = 1;
SET @auto_increment_increment = 2;
-- DB 2
SET @auto_increment_offset = 2;
SET @auto_increment_increment = 2;Pros: Minimal changes, IDs remain numeric and ordered. Cons: Hard to scale—adding new shards breaks the fixed offset/step pattern, making smooth expansion difficult.
Solution 3: Segment (Range) Allocation
Create a dedicated table (e.g., leaf_alloc) that stores the current maximum ID and the step size for each business tag. When a service needs IDs, it atomically updates max_id by step and caches the returned range.
CREATE TABLE `leaf_alloc` (
`biz_tag` VARCHAR(128) NOT NULL COMMENT 'business tag, e.g., order',
`max_id` BIGINT(20) NOT NULL COMMENT 'current max ID',
`step` INT(11) NOT NULL COMMENT 'range size',
`description` VARCHAR(256) DEFAULT NULL,
`update_time` TIMESTAMP NOT NULL,
PRIMARY KEY (`biz_tag`)
); UPDATE leaf_alloc
SET max_id = max_id + step, update_time = NOW()
WHERE biz_tag = 'order';Pros: Reduces database load by fetching IDs in batches; IDs are long integers and mostly ordered. Cons: When a segment is exhausted, a brief pause may occur while fetching the next batch; relies heavily on DB availability.
Solution 4: Snowflake Algorithm
Twitter's 64‑bit Snowflake ID consists of:
1 sign bit (always 0)
41 bits timestamp (ms since custom epoch, ~69 years)
10 bits machine ID (supports 1024 nodes)
12 bits sequence within the same millisecond (4096 IDs/ms)
Typical throughput: 4.09 million IDs/second per node, generated locally without network calls.
public synchronized long nextId() {
long currentMs = System.currentTimeMillis();
if (currentMs == lastMs) {
sequence = (sequence + 1) & MAX_SEQUENCE;
if (sequence == 0) {
currentMs = waitNextMs(lastMs);
}
} else {
sequence = 0L;
}
lastMs = currentMs;
return (currentMs - EPOCH) << TIMESTAMP_SHIFT
| workerId << WORKER_ID_SHIFT
| sequence;
}Pros: High performance, IDs are time‑ordered, small storage footprint. Cons: Clock rollback can cause duplicate IDs; worker ID must be unique, which is hard to manage in dynamic container environments.
Typical mitigation strategies for clock rollback:
Wait until the system clock catches up.
Throw an exception for large rollbacks and let the caller retry.
Switch to a standby worker ID during the rollback period.
Solution 5: Meituan Leaf (Segment + Snowflake)
Leaf provides industrial‑grade implementations of both segment and Snowflake modes. It uses ZooKeeper to automatically assign worker IDs and to detect clock rollback:
Worker ID is obtained from a sequential persistent ZK node at startup.
Each service periodically writes its current timestamp to a temporary ZK node; on restart, a timestamp older than the stored value triggers a startup failure.
Example Maven dependency:
<dependency>
<groupId>com.sankuai.inf.leaf</groupId>
<artifactId>leaf-core</artifactId>
<version>1.0.1</version>
</dependency>After configuring DB or ZK addresses, calling IDGen#get(key) returns a new ID with minimal integration effort.
Comparison of the Five Approaches
UUID : 36‑char string, unordered, extremely fast locally, no dependencies, but large storage and poor index performance.
Database Step : Long, ordered, medium performance (depends on DB), MySQL required, hard to expand.
Segment Mode : Long, near‑ordered, high performance (in‑memory cache), MySQL required, risks include ID pattern exposure and strong DB HA requirements.
Snowflake : Long, trend‑ordered, ultra‑high local performance, no external dependencies, but vulnerable to clock rollback and requires careful worker‑ID management.
Meituan Leaf : Long, trend‑ordered, ultra‑high performance, depends on MySQL and ZooKeeper, adds component complexity.
Choosing the Right Solution
Most applications: use Snowflake directly or Leaf‑snowflake for simplicity and speed.
Financial or scenarios demanding strict ID continuity: prefer Segment or Leaf‑segment.
Temporary identifiers or idempotency keys: UUID is sufficient (but not as a DB primary key).
Avoid the database step method unless the shard count is fixed and predictable.
If you already use ShardingSphere, you can enable its built‑in Snowflake generator by setting type: SNOWFLAKE in the configuration.
Conclusion
The root cause of ID collisions after sharding is that each shard independently generates auto‑increment values. Centralising ID generation—via UUID, stepped auto‑increment, segment allocation, Snowflake, or an industrial solution like Meituan Leaf—eliminates the conflict. Among them, Snowflake offers the best balance of performance and ordering for most workloads, while Segment excels where ID monotonicity is critical. UUID remains useful for non‑primary‑key identifiers, and the database‑step approach is generally discouraged for new projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
