Databases 8 min read

How to Build a High‑Availability, High‑Performance Distributed ID Generator

Distributed systems need globally unique, often monotonic IDs, and this article examines common ID generation strategies—Snowflake, database auto‑increment, segment allocation, multi‑master databases, and Raft‑based consensus—evaluating each for high availability and high performance, and highlighting trade‑offs and implementation details.

Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
How to Build a High‑Availability, High‑Performance Distributed ID Generator

Background

In distributed scenarios, many places need globally unique IDs, e.g., after database sharding you need a unique ID instead of a single‑machine auto‑increment. The basic requirements for an ID generator are:

Globally unique, never duplicate

Some scenarios also require monotonic increase, such as sorting.

Many articles exist, e.g., Meituan's "Leaf" and Youzan's "How to Build a Reliable ID Generator". This article focuses on high availability and high performance.

High availability: the service remains usable and no duplicate IDs are generated despite failures

High performance: the generator must handle very high concurrency and be horizontally scalable

Given these basic requirements, what common solutions exist, and are they truly high‑availability and high‑performance?

Snowflake solution

snowflake

uses a 41‑bit timestamp, 10‑bit machine ID, and 12‑bit sequence. The sequence can be generated with an AtomicLong, and the 10‑bit machine ID supports up to 1024 machines.

Advantages:

Simple algorithm, easy to implement, no third‑party dependencies, very high performance

Stateless cluster, easy to scale, considered highly available

Disadvantages:

Timestamp ensures monotonicity, but machine IDs cannot guarantee order across machines

Relies on the clock; if the clock moves backward, duplicate IDs may be generated

Overall, Snowflake meets the basic requirements and offers very high performance, but due to clock‑rollback issues it is not a high‑availability solution.

Database‑based solution

Using the database auto‑increment feature:

Simple implementation, only depends on the database

No clock‑rollback problem

Generated IDs are monotonic

Drawbacks:

Performance limited by the database’s single‑node write capacity; cannot scale horizontally

Single point of failure; in master‑slave setups consistency depends on replication mode (asynchronous, semi‑synchronous, or full synchronous). Only full synchronous replication guarantees availability; otherwise, failover may cause duplicate IDs.

The same idea can use Redis incr, but Redis only offers asynchronous replication, further reducing consistency guarantees.

In summary, without full synchronous replication the database approach is not highly available, and even with it performance suffers.

Database segment allocation solution

This optimizes performance by fetching a range of IDs from the database and allocating them locally. It greatly improves performance over the simple database approach, but loses monotonicity. With full synchronous replication it can be both highly available and high performance.

Multi‑master database solution

Similar to the segment approach, but uses multiple databases with distinct auto‑increment offsets (e.g., three masters with start values 1, 2, 3 and step 3). IDs from each master never collide. A round‑robin strategy fetches segments; if one master fails, others continue. This provides high availability; high performance is achieved via segment allocation, though horizontal scaling of databases is difficult.

Consensus‑based solution

The high‑availability issue stems from master‑slave inconsistency. Using a consensus protocol like Raft ensures data is replicated to a majority of nodes. After each segment is allocated, it is persisted to a majority and eventually to all nodes; if the leader fails, Raft elects a new one. Youzan’s reliable ID generator uses etcd and Raft; open‑source Raft libraries such as Ant Financial’s SOFAJRaft can also be used.

Summary

High performance of ID generators mainly relies on segment allocation.

High availability can be achieved through database high‑availability, multi‑master setups, or consensus protocols.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseSnowflakeRaftDistributed IDhigh-availabilityhigh-performance
Xiao Lou's Tech Notes
Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.