How We Built a High‑Availability Distributed ID Service for Order Management
This article explains why Yanxuan needed a distributed ID system, describes the selection of Leaf's segment mode, details architectural optimizations such as double‑buffering and dynamic step adjustment, shares operational safeguards, and outlines the pitfalls and solutions discovered during implementation.
1. Why a Distributed ID?
In Yanxuan's order ecosystem the main site, distribution channel, and B2B services each generate their own order IDs. When synchronizing data to the order center, an internal order number is created, but the ID sent to downstream warehouses differs, causing inconsistencies.
1.1 Background
Multiple subsystems generate independent order IDs, leading to mismatched IDs during data aggregation.
1.2 Problem
The chaotic use of order IDs creates communication barriers, makes governance difficult, and results in code decay.
1.3 Goal
The distributed ID must be globally unique, highly secure, highly available, and low‑latency.
2. Architecture Principles
2.1 Technology Selection
The table below shows common distributed‑ID solutions; after evaluating horizontal scalability and configurable ID length, Leaf's Segment mode was chosen.
Leaf supports horizontal expansion and allows explicit ID length configuration.
2.2 Architecture Overview
Leaf pre‑allocates ID blocks from a database to several servers. Each server fetches a fixed‑length list of IDs into memory at startup, enabling fast in‑memory distribution.
Only the maximum ID of each block is persisted, reducing DB write pressure.
2.3 Availability Optimizations
Two main issues were observed:
When a segment is exhausted, the service hangs on DB I/O, causing occasional spikes.
If the DB fails or a master‑slave switch occurs during segment update, the service becomes unavailable.
To address them, a double‑buffer mechanism and asynchronous update strategy were introduced. When one buffer reaches a threshold, an async task loads the next segment into memory.
This ensures that even if the DB is down, a buffered segment can continue serving requests, provided the DB recovers within the buffer’s cycle.
2.4 Dynamic Step Adjustment
Fixed segment length can become inefficient under traffic spikes or drops. The relationship Q·T = L (Q = QPS, L = segment length, T = update period) guides dynamic adjustment:
T < 15 min → nextStep = step × 2
15 min ≤ T ≤ 30 min → nextStep = step
T > 30 min → nextStep = step ÷ 2
Initial step ≤ nextStep ≤ max (customizable, e.g., 1 000 000).
3. What We Improved
3.1 Feature Enrichment
Based on Yanxuan’s business scenarios, we added batch ID acquisition, pre‑scaling for large promotions, and early‑segment‑jump handling.
3.2 Availability Guarantees
3.2.1 DB
MySQL uses a master‑slave (read/write separation) setup with semi‑synchronous replication and a double‑1 configuration to avoid data loss.
3.2.2 SDK
The SDK reduces integration cost for downstream services, eases load on Leaf servers, and provides a short‑term fallback when the Leaf service is unavailable.
The SDK follows the same double‑buffer principle as the server side.
3.3 Stability Guarantees
3.3.1 Operations
Three pillars: log monitoring, traffic monitoring, and online health checks.
Log monitoring detects unexpected anomalies.
Traffic monitoring helps evaluate segment length usage and prevents rapid consumption.
Online health checks continuously verify service liveness.
3.3.2 SLA
SLA metrics focus on request latency and error rate, establishing SLO targets for interface stability.
4. Pitfalls Encountered
4.1 Issue Discovery
Service startup showed unusually high response times, which later normalized.
4.2 Investigation – JVM Tiered Compilation
Java bytecode can be interpreted or JIT‑compiled. HotSpot uses a tiered compilation pipeline (L0‑L4). The JVM decides when to promote methods from interpreter (L0) to C1 (L1‑L3) and finally to C2 (L4) based on execution counts and profiling.
Tiered compilation levels:
L0 – interpreter (with profiling)
L1 – C1 without profiling
L2 – C1 with call‑count profiling
L3 – C1 with full profiling
L4 – C2 (optimizing compiler)
Typical compilation path is 0→3→4, but under load the JVM may skip stages.
4.3 Solutions
Disable tiered compilation and lower compilation thresholds.
Mock interface data to quickly trigger JIT and C2 compilation.
Use Java 9 AOT compilation to pre‑compile code, eliminating early‑stage interpretation overhead.
Java 9 AOT compiles code ahead of time, reducing runtime JIT cost and improving start‑up performance.
5. Production Usage
Leaf is now deployed in production, integrated by the main site, channel, and B2B services. It generates three types of IDs: order ID, order snapshot ID, and order item snapshot ID, and has withstood Double 11 and Double 12 traffic spikes.
6. Summary
This article presented the selection, design, and implementation of a distributed ID service at Yanxuan, focusing on improving system availability and stability. While performance gains have been achieved, further optimizations remain possible, and broader adoption across business lines is anticipated.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
