Databases 11 min read

Primary Key Strategies After Database Sharding

After splitting a database into multiple shards, generating globally unique primary keys becomes essential, and this article examines various solutions—including auto‑increment IDs, sequence steps, UUIDs, timestamp concatenation, and the Snowflake algorithm—detailing their advantages, drawbacks, and suitable scenarios.

Selected Java Interview Questions

Oct 11, 2019

Primary Key Strategies After Database Sharding

Interviewer's Perspective

When a system adopts database sharding, the problem of generating a globally unique primary key inevitably arises; using simple incremental IDs per table would cause collisions, so a strategy that guarantees uniqueness across all shards is required.

Question Analysis

Database Auto‑Increment ID

This approach inserts a dummy row into a dedicated table to obtain an auto‑incremented value from the database, then uses that value as the primary key in the target shard.

Advantages

Simple and universally supported.

Disadvantages

Relies on a single database for ID generation, creating a bottleneck under high concurrency; improvements involve a separate service that allocates batches of IDs, but the underlying limitation of a single database remains.

Suitable Scenarios

When sharding is driven solely by extremely high write concurrency or massive data volume, and the overall request rate is modest (e.g., a few hundred per second), a single‑database auto‑increment can suffice.

Sequence or Table Auto‑Increment Step

By configuring a database sequence or setting a custom step size for an auto‑increment column, IDs can be spaced to allow horizontal scaling. For example, with eight service nodes, each node uses a sequence that starts at a different offset and increments by 8.

Suitable when the number of nodes is fixed and the fixed step size is acceptable; adding new nodes later becomes cumbersome.

UUID

UUIDs are generated locally without database interaction, but they are long, consume more storage, and perform poorly as primary keys because they lack ordering, causing excessive random writes in B‑Tree indexes.

UUID.randomUUID().toString().replace("-", "") -> sfsdf23423rr234sfdaf

Use UUIDs for non‑key purposes such as file names or random identifiers, but avoid them as primary keys.

Current Time Based ID

Combining the current timestamp with other business fields can produce an ID, but under high concurrency (thousands of requests per second) timestamp collisions are likely, making this method unsuitable for pure primary keys.

It can be used when the generated ID is a composite of time and additional fields that guarantee uniqueness.

Snowflake Algorithm

Twitter’s Snowflake is a distributed ID generation algorithm that produces a 64‑bit long ID composed of:

1 unused bit (ensures the ID is positive).

41 bits for the timestamp in milliseconds (covers ~69 years).

10 bits for machine identification (5 bits for data‑center ID, 5 bits for worker ID, supporting up to 1024 machines).

12 bits for a sequence number within the same millisecond (supports up to 4096 IDs per millisecond per machine).

Example binary representation:

0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000

Java implementation (IdWorker) demonstrates the bit shifts, validation of worker and data‑center IDs, and the synchronized nextId() method that assembles the final 64‑bit ID.

public class IdWorker {
    private long workerId;
    private long datacenterId;
    private long sequence;
    // ... (fields omitted for brevity)
    public synchronized long nextId() {
        long timestamp = timeGen();
        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards");
        }
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }
        lastTimestamp = timestamp;
        return ((timestamp - twepoch) << timestampLeftShift)
                | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift)
                | sequence;
    }
    // helper methods omitted
}

The Snowflake algorithm is reliable for high‑throughput distributed systems, comfortably handling tens of thousands of IDs per second.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding UUID Snowflake ID Generation Primary Key

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.