Backend Development 13 min read

How We Built a High‑Availability Distributed ID Service for Order Management

This article explains why Yanxuan needed a distributed ID system, describes the selection of Leaf's segment mode, details architectural optimizations such as double‑buffering and dynamic step adjustment, shares operational safeguards, and outlines the pitfalls and solutions discovered during implementation.

Yanxuan Tech Team

Mar 5, 2021

How We Built a High‑Availability Distributed ID Service for Order Management

1. Why a Distributed ID?

In Yanxuan's order ecosystem the main site, distribution channel, and B2B services each generate their own order IDs. When synchronizing data to the order center, an internal order number is created, but the ID sent to downstream warehouses differs, causing inconsistencies.

1.1 Background

Multiple subsystems generate independent order IDs, leading to mismatched IDs during data aggregation.

1.2 Problem

The chaotic use of order IDs creates communication barriers, makes governance difficult, and results in code decay.

1.3 Goal

The distributed ID must be globally unique, highly secure, highly available, and low‑latency.

2. Architecture Principles

2.1 Technology Selection

The table below shows common distributed‑ID solutions; after evaluating horizontal scalability and configurable ID length, Leaf's Segment mode was chosen.

Leaf supports horizontal expansion and allows explicit ID length configuration.

2.2 Architecture Overview

Leaf pre‑allocates ID blocks from a database to several servers. Each server fetches a fixed‑length list of IDs into memory at startup, enabling fast in‑memory distribution.

Only the maximum ID of each block is persisted, reducing DB write pressure.

2.3 Availability Optimizations

Two main issues were observed:

When a segment is exhausted, the service hangs on DB I/O, causing occasional spikes.

If the DB fails or a master‑slave switch occurs during segment update, the service becomes unavailable.

To address them, a double‑buffer mechanism and asynchronous update strategy were introduced. When one buffer reaches a threshold, an async task loads the next segment into memory.

This ensures that even if the DB is down, a buffered segment can continue serving requests, provided the DB recovers within the buffer’s cycle.

2.4 Dynamic Step Adjustment

Fixed segment length can become inefficient under traffic spikes or drops. The relationship Q·T = L (Q = QPS, L = segment length, T = update period) guides dynamic adjustment:

T < 15 min → nextStep = step × 2

15 min ≤ T ≤ 30 min → nextStep = step

T > 30 min → nextStep = step ÷ 2

Initial step ≤ nextStep ≤ max (customizable, e.g., 1 000 000).

3. What We Improved

3.1 Feature Enrichment

Based on Yanxuan’s business scenarios, we added batch ID acquisition, pre‑scaling for large promotions, and early‑segment‑jump handling.

3.2 Availability Guarantees

3.2.1 DB

MySQL uses a master‑slave (read/write separation) setup with semi‑synchronous replication and a double‑1 configuration to avoid data loss.

3.2.2 SDK

The SDK reduces integration cost for downstream services, eases load on Leaf servers, and provides a short‑term fallback when the Leaf service is unavailable.

The SDK follows the same double‑buffer principle as the server side.

3.3 Stability Guarantees

3.3.1 Operations

Three pillars: log monitoring, traffic monitoring, and online health checks.

Log monitoring detects unexpected anomalies.

Traffic monitoring helps evaluate segment length usage and prevents rapid consumption.

Online health checks continuously verify service liveness.

3.3.2 SLA

SLA metrics focus on request latency and error rate, establishing SLO targets for interface stability.

4. Pitfalls Encountered

4.1 Issue Discovery

Service startup showed unusually high response times, which later normalized.

4.2 Investigation – JVM Tiered Compilation

Java bytecode can be interpreted or JIT‑compiled. HotSpot uses a tiered compilation pipeline (L0‑L4). The JVM decides when to promote methods from interpreter (L0) to C1 (L1‑L3) and finally to C2 (L4) based on execution counts and profiling.

Tiered compilation levels:

L0 – interpreter (with profiling)

L1 – C1 without profiling

L2 – C1 with call‑count profiling

L3 – C1 with full profiling

L4 – C2 (optimizing compiler)

Typical compilation path is 0→3→4, but under load the JVM may skip stages.

4.3 Solutions

Disable tiered compilation and lower compilation thresholds.

Mock interface data to quickly trigger JIT and C2 compilation.

Use Java 9 AOT compilation to pre‑compile code, eliminating early‑stage interpretation overhead.

Java 9 AOT compiles code ahead of time, reducing runtime JIT cost and improving start‑up performance.

5. Production Usage

Leaf is now deployed in production, integrated by the main site, channel, and B2B services. It generates three types of IDs: order ID, order snapshot ID, and order item snapshot ID, and has withstood Double 11 and Double 12 traffic spikes.

6. Summary

This article presented the selection, design, and implementation of a distributed ID service at Yanxuan, focusing on improving system availability and stability. While performance gains have been achieved, further optimizations remain possible, and broader adoption across business lines is anticipated.

Leaf Distributed ID high-availability

Written by

Yanxuan Tech Team

NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.