Databases 15 min read

How to Resolve ID Conflicts After Sharding? 5 Proven Solutions

When sharding databases, independent auto‑increment IDs can collide, causing data integrity issues; this article explains why the conflict occurs and walks through five practical solutions—including UUID, custom auto‑increment steps, segment mode, Snowflake, and Meituan Leaf—detailing their trade‑offs, performance, and implementation steps.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
How to Resolve ID Conflicts After Sharding? 5 Proven Solutions

Why IDs Conflict After Sharding

In a single‑database, single‑table design AUTO_INCREMENT guarantees uniqueness. After splitting a table (e.g., order) into multiple shards across databases db_0 and db_1, each shard keeps its own auto‑increment counter. If order_0 in db_0 and order_2 in db_1 both reach 1000, the same ID appears in two places, breaking cross‑shard queries and data migration.

The root cause is that ID generation must be global rather than per‑table.

Solution 1: UUID

Generate a 128‑bit UUID as the primary key:

String id = UUID.randomUUID().toString(); // e.g., 550e8400-e29b-41d4-a716-446655440000

Storage impact : stored as VARCHAR(36) (or VARCHAR(32) without hyphens) consumes several times more space than a BIGINT.

Index ordering : UUIDs are random; InnoDB B‑tree inserts become scattered, causing page splits and degraded write throughput.

Readability : the string provides no creation‑time hint.

Suitable for low‑performance identifiers such as idempotency tokens or tracing IDs, but not recommended as a primary key for high‑concurrency tables.

Solution 2: Database Auto‑Increment Step

Configure each MySQL instance with a distinct auto_increment_offset and set auto_increment_increment to the number of shards. Example for two databases:

DB 1: offset 1, step 2 → IDs 1, 3, 5, …

DB 2: offset 2, step 2 → IDs 2, 4, 6, …

-- DB 1
SET @auto_increment_offset = 1;
SET @auto_increment_increment = 2;

-- DB 2
SET @auto_increment_offset = 2;
SET @auto_increment_increment = 2;

Pros: minimal schema change, numeric and monotonic IDs. Cons: scaling is painful—adding new shards changes the step pattern and creates conflicts with previously generated IDs. Best for systems with a fixed shard count.

Solution 3: Segment Mode (Leaf)

Allocate a block of IDs from a dedicated table leaf_alloc and cache the block in memory.

CREATE TABLE `leaf_alloc` (
  `biz_tag`   VARCHAR(128) NOT NULL COMMENT 'business tag, e.g., order, user',
  `max_id`    BIGINT(20)   NOT NULL COMMENT 'current max allocated ID',
  `step`      INT(11)      NOT NULL COMMENT 'size of each allocated segment',
  `description` VARCHAR(256) DEFAULT NULL,
  `update_time` TIMESTAMP NOT NULL,
  PRIMARY KEY (`biz_tag`)
);

When a service needs IDs, it runs:

UPDATE leaf_alloc
SET max_id = max_id + step,
    update_time = NOW()
WHERE biz_tag = 'order';

If max_id was 1000 and step is 500, the update makes

max_id
1500

and the service receives the range 1001‑1500. The database is hit only once per segment, dramatically reducing load.

Potential pause: when the current segment is nearly exhausted, fetching the next segment can introduce a short delay. Meituan’s Leaf mitigates this with a double‑buffer strategy—pre‑loading the next segment when 10 % of the current one remains, achieving seamless handover.

Leaf segment architecture
Leaf segment architecture

Solution 4: Snowflake Algorithm

Twitter’s open‑source Snowflake generates a 64‑bit long ID composed of:

1 sign bit (always 0)

41 bits timestamp (milliseconds since a custom epoch, ~69 years)

10 bits machine ID (1024 nodes, often split into data‑center + worker)

12 bits sequence (4096 IDs per ms per node)

Throughput can reach 4.09 million IDs/second per node, generated locally without network calls.

public synchronized long nextId() {
    long currentMs = System.currentTimeMillis();
    if (currentMs == lastMs) {
        sequence = (sequence + 1) & MAX_SEQUENCE;
        if (sequence == 0) {
            currentMs = waitNextMs(lastMs);
        }
    } else {
        sequence = 0L;
    }
    lastMs = currentMs;
    return (currentMs - EPOCH) << TIMESTAMP_SHIFT
           | workerId << WORKER_ID_SHIFT
           | sequence;
}

Advantages: time‑ordered IDs, high performance, small storage (8 bytes). Drawbacks:

Clock rollback can cause duplicate IDs when currentMs < lastMs.

Unique worker IDs must be managed; manual configuration becomes cumbersome in dynamic container environments.

Typical mitigation strategies for clock rollback:

Wait until the clock catches up (suitable for small drifts).

Throw an exception and let the caller retry (for larger drifts).

Switch to a standby worker ID and continue generation.

Solution 5: Meituan Leaf (Combined Segment & Snowflake)

Leaf implements both segment and Snowflake modes with industrial‑grade enhancements.

Automatic worker‑ID allocation : Leaf registers a sequential persistent node in ZooKeeper; the node’s sequence number becomes the worker ID, eliminating manual configuration.

Clock‑rollback detection : Leaf periodically writes the current timestamp to ZooKeeper. On startup, if the local clock is earlier than the stored timestamp, the service aborts with an alarm.

<dependency>
    <groupId>com.sankuai.inf.leaf</groupId>
    <artifactId>leaf-core</artifactId>
    <version>1.0.1</version>
</dependency>

After configuring DB and ZooKeeper addresses, calling IDGen#get(key) returns a unique ID with minimal integration effort.

Comparison of All Solutions

UUID

ID type: 36‑char string

Ordered: No

Performance: Very high (local)

Dependencies: None

Risks: Large index size, random inserts degrade write performance

Database Step

ID type: Long (incremental)

Ordered: Yes

Performance: Medium (DB bound)

Dependencies: MySQL

Risks: Hard to scale, step fixed after deployment

Segment Mode

ID type: Long (trend‑incremental)

Ordered: Near‑ordered

Performance: High (in‑memory cache)

Dependencies: MySQL

Risks: ID pattern exposure, requires DB HA; short pause when a segment is exhausted

Snowflake

ID type: Long (trend‑incremental)

Ordered: Trend‑ordered

Performance: Very high (local)

Dependencies: None

Risks: Clock rollback, worker‑ID management

Meituan Leaf

ID type: Long (trend‑incremental)

Ordered: Trend‑ordered

Performance: Very high

Dependencies: MySQL + ZooKeeper

Risks: Additional component complexity

How to Choose

General business scenarios : use Snowflake directly or Leaf‑snowflake for high performance and simple integration.

Financial or strict sequential‑ID requirements : prefer Segment mode or Leaf‑segment to avoid clock dependence and obtain compact IDs.

Temporary identifiers / idempotent keys : UUID is sufficient but should not be a primary key.

Database step solution : avoid unless the shard count is guaranteed never to change.

If ShardingSphere is in use, enable its built‑in Snowflake generator with type: SNOWFLAKE in the configuration.

Conclusion

ID conflicts after sharding arise because each shard manages its own auto‑increment sequence. Centralizing ID generation eliminates the conflict. Among the five approaches, Snowflake offers the best balance of performance and ordering for most cases, while Segment mode excels where strict order and compactness are critical. UUID provides simplicity but is unsuitable for primary keys in high‑throughput tables. The DB step method is inflexible and should be avoided in new projects. In production, prefer battle‑tested implementations such as Meituan Leaf or Baidu’s UidGenerator rather than hand‑rolling Snowflake.

databaseshardingUUIDLeafSnowflakeID generationsegment mode
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.