How to Resolve ID Conflicts After Sharding? 5 Proven Solutions
When sharding databases, independent auto‑increment IDs can collide, causing data integrity issues; this article explains why the conflict occurs and walks through five practical solutions—including UUID, custom auto‑increment steps, segment mode, Snowflake, and Meituan Leaf—detailing their trade‑offs, performance, and implementation steps.
Why IDs Conflict After Sharding
In a single‑database, single‑table design AUTO_INCREMENT guarantees uniqueness. After splitting a table (e.g., order) into multiple shards across databases db_0 and db_1, each shard keeps its own auto‑increment counter. If order_0 in db_0 and order_2 in db_1 both reach 1000, the same ID appears in two places, breaking cross‑shard queries and data migration.
The root cause is that ID generation must be global rather than per‑table.
Solution 1: UUID
Generate a 128‑bit UUID as the primary key:
String id = UUID.randomUUID().toString(); // e.g., 550e8400-e29b-41d4-a716-446655440000Storage impact : stored as VARCHAR(36) (or VARCHAR(32) without hyphens) consumes several times more space than a BIGINT.
Index ordering : UUIDs are random; InnoDB B‑tree inserts become scattered, causing page splits and degraded write throughput.
Readability : the string provides no creation‑time hint.
Suitable for low‑performance identifiers such as idempotency tokens or tracing IDs, but not recommended as a primary key for high‑concurrency tables.
Solution 2: Database Auto‑Increment Step
Configure each MySQL instance with a distinct auto_increment_offset and set auto_increment_increment to the number of shards. Example for two databases:
DB 1: offset 1, step 2 → IDs 1, 3, 5, …
DB 2: offset 2, step 2 → IDs 2, 4, 6, …
-- DB 1
SET @auto_increment_offset = 1;
SET @auto_increment_increment = 2;
-- DB 2
SET @auto_increment_offset = 2;
SET @auto_increment_increment = 2;Pros: minimal schema change, numeric and monotonic IDs. Cons: scaling is painful—adding new shards changes the step pattern and creates conflicts with previously generated IDs. Best for systems with a fixed shard count.
Solution 3: Segment Mode (Leaf)
Allocate a block of IDs from a dedicated table leaf_alloc and cache the block in memory.
CREATE TABLE `leaf_alloc` (
`biz_tag` VARCHAR(128) NOT NULL COMMENT 'business tag, e.g., order, user',
`max_id` BIGINT(20) NOT NULL COMMENT 'current max allocated ID',
`step` INT(11) NOT NULL COMMENT 'size of each allocated segment',
`description` VARCHAR(256) DEFAULT NULL,
`update_time` TIMESTAMP NOT NULL,
PRIMARY KEY (`biz_tag`)
);When a service needs IDs, it runs:
UPDATE leaf_alloc
SET max_id = max_id + step,
update_time = NOW()
WHERE biz_tag = 'order';If max_id was 1000 and step is 500, the update makes
max_id 1500and the service receives the range 1001‑1500. The database is hit only once per segment, dramatically reducing load.
Potential pause: when the current segment is nearly exhausted, fetching the next segment can introduce a short delay. Meituan’s Leaf mitigates this with a double‑buffer strategy—pre‑loading the next segment when 10 % of the current one remains, achieving seamless handover.
Solution 4: Snowflake Algorithm
Twitter’s open‑source Snowflake generates a 64‑bit long ID composed of:
1 sign bit (always 0)
41 bits timestamp (milliseconds since a custom epoch, ~69 years)
10 bits machine ID (1024 nodes, often split into data‑center + worker)
12 bits sequence (4096 IDs per ms per node)
Throughput can reach 4.09 million IDs/second per node, generated locally without network calls.
public synchronized long nextId() {
long currentMs = System.currentTimeMillis();
if (currentMs == lastMs) {
sequence = (sequence + 1) & MAX_SEQUENCE;
if (sequence == 0) {
currentMs = waitNextMs(lastMs);
}
} else {
sequence = 0L;
}
lastMs = currentMs;
return (currentMs - EPOCH) << TIMESTAMP_SHIFT
| workerId << WORKER_ID_SHIFT
| sequence;
}Advantages: time‑ordered IDs, high performance, small storage (8 bytes). Drawbacks:
Clock rollback can cause duplicate IDs when currentMs < lastMs.
Unique worker IDs must be managed; manual configuration becomes cumbersome in dynamic container environments.
Typical mitigation strategies for clock rollback:
Wait until the clock catches up (suitable for small drifts).
Throw an exception and let the caller retry (for larger drifts).
Switch to a standby worker ID and continue generation.
Solution 5: Meituan Leaf (Combined Segment & Snowflake)
Leaf implements both segment and Snowflake modes with industrial‑grade enhancements.
Automatic worker‑ID allocation : Leaf registers a sequential persistent node in ZooKeeper; the node’s sequence number becomes the worker ID, eliminating manual configuration.
Clock‑rollback detection : Leaf periodically writes the current timestamp to ZooKeeper. On startup, if the local clock is earlier than the stored timestamp, the service aborts with an alarm.
<dependency>
<groupId>com.sankuai.inf.leaf</groupId>
<artifactId>leaf-core</artifactId>
<version>1.0.1</version>
</dependency>After configuring DB and ZooKeeper addresses, calling IDGen#get(key) returns a unique ID with minimal integration effort.
Comparison of All Solutions
UUID
ID type: 36‑char string
Ordered: No
Performance: Very high (local)
Dependencies: None
Risks: Large index size, random inserts degrade write performance
Database Step
ID type: Long (incremental)
Ordered: Yes
Performance: Medium (DB bound)
Dependencies: MySQL
Risks: Hard to scale, step fixed after deployment
Segment Mode
ID type: Long (trend‑incremental)
Ordered: Near‑ordered
Performance: High (in‑memory cache)
Dependencies: MySQL
Risks: ID pattern exposure, requires DB HA; short pause when a segment is exhausted
Snowflake
ID type: Long (trend‑incremental)
Ordered: Trend‑ordered
Performance: Very high (local)
Dependencies: None
Risks: Clock rollback, worker‑ID management
Meituan Leaf
ID type: Long (trend‑incremental)
Ordered: Trend‑ordered
Performance: Very high
Dependencies: MySQL + ZooKeeper
Risks: Additional component complexity
How to Choose
General business scenarios : use Snowflake directly or Leaf‑snowflake for high performance and simple integration.
Financial or strict sequential‑ID requirements : prefer Segment mode or Leaf‑segment to avoid clock dependence and obtain compact IDs.
Temporary identifiers / idempotent keys : UUID is sufficient but should not be a primary key.
Database step solution : avoid unless the shard count is guaranteed never to change.
If ShardingSphere is in use, enable its built‑in Snowflake generator with type: SNOWFLAKE in the configuration.
Conclusion
ID conflicts after sharding arise because each shard manages its own auto‑increment sequence. Centralizing ID generation eliminates the conflict. Among the five approaches, Snowflake offers the best balance of performance and ordering for most cases, while Segment mode excels where strict order and compactness are critical. UUID provides simplicity but is unsuitable for primary keys in high‑throughput tables. The DB step method is inflexible and should be avoided in new projects. In production, prefer battle‑tested implementations such as Meituan Leaf or Baidu’s UidGenerator rather than hand‑rolling Snowflake.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
