How JD Daojia Scaled Its Order System to Billion‑Scale: Architecture, Evolution, and High‑Availability Practices

This article details JD Daojia's order system architecture, tracing its evolution from a monolithic design to a micro‑service, multi‑cluster setup with Redis, MySQL, and Elasticsearch, and explains the high‑availability, disaster‑recovery, capacity‑planning, and alerting techniques that keep billions of orders running smoothly.

dbaplus Community
dbaplus Community
dbaplus Community
How JD Daojia Scaled Its Order System to Billion‑Scale: Architecture, Evolution, and High‑Availability Practices

JD Daojia, a leading local instant‑retail platform, needed to support one‑hour delivery for tens of millions of users, prompting a continuous evolution of its order system from a single module to a distributed, high‑concurrency, high‑availability architecture.

1. JD Daojia System Architecture

Business Architecture

The system consists of C‑end users, B‑end merchants (e.g., Walmart, Yonghui), merchant picking apps, a delivery side for rider order grabbing, and settlement for both riders and merchants.

Operational Support Architecture

Five core modules support operations: merchant management, CMS, marketing, finance, and operational data, backed by caching (LocalCache, Redis), databases (MySQL, MongoDB), and Elasticsearch.

Backend Architecture

All user requests pass through a gateway, then reach business services such as home page, store page, cart, checkout, order creation, payment, and personal order management. Core services rely on foundational services like inventory, product, store, and pricing.

Order Data Ingestion Flow

After a user places an order, the order data is sharded and stored. An asynchronous pipeline distributes the order to the production system at a controlled rate, preventing spikes during large promotions.

Personal order DBs are separated from production order DBs to improve stability and match different query patterns.

2. Order System Architecture Evolution

Order Fulfillment Process

After payment, orders are completed, then dispatched to merchants via open platforms or merchant centers. Merchants pick items, handle out‑of‑stock adjustments, and the system supports single‑order and batch‑order delivery.

RPC Microservice Cluster

Microservice decomposition reduces complexity, enables independent deployment, and improves scalability. Benefits include reduced coupling, faster iteration, and easier horizontal scaling.

Lower complexity by splitting domains into separate services.

Independent deployment for faster releases.

Scalable architecture that can be expanded based on traffic.

Redis Cluster

Redis provides distributed locks, caching, ordered queues, and scheduled tasks. A dual‑cluster setup isolates core and non‑core workloads to avoid large‑key performance issues.

MySQL Cluster

Initially a single master‑slave setup caused high write pressure. Read‑write separation and later sharding into hot (recent days) and cold (historical) databases reduced load and improved latency.

Elasticsearch Cluster

ES handles complex order queries. A hot‑cold dual‑cluster design stores recent production data in the hot cluster and archives older data in the cold cluster. Data sync uses Canal + Kafka to capture binlog changes and push them asynchronously to ES.

3. Order System Stability Practices

Availability Construction

Three‑stage reliability model (pre‑, during‑, post‑incident) with four capability pillars: prevention, diagnosis, resolution, and avoidance.

Disaster Recovery

ES hot‑cold clusters, Redis isolated clusters, API degradation, and multi‑datacenter deployments ensure continuity. Critical interfaces can fall back to asynchronous processing when needed.

Capacity Planning

Capacity is evaluated via load‑testing tools, scenario‑based stress tests, and staged scaling (vertical sharding, redundancy, auto‑archiving). Monitoring covers CPU, load, network I/O, thread count, disk usage, and TCP connections.

Alerting

JD's self‑developed UMP monitors interface metrics (availability, latency, QPS, error thresholds) and application metrics (GC, heap, CPU, thread count). Alerts trigger throttling, circuit‑breakers, or manual intervention.

4. Summary

Architects must balance ROI with technical debt, continuously simplify complex problems, and adapt the stack (Redis, MySQL, ES) to meet growing scale and performance demands. Ongoing research into high‑availability design, multi‑region deployment, and automated monitoring remains essential as JD Daojia expands.

Q&A Highlights

Cluster size: ES holds ~30 billion documents (~1.3 TB) with 8 primary shards, each with 2 replicas (24 shards total).

ES best practices: Use static shard counts (powers of two), keep shard size reasonable, avoid frequent updates, and delete by index (time‑based indices).

Canal sync delay: Near‑real‑time (≈1 s) but depends on pipeline bottlenecks; not suitable for strict real‑time requirements.

Redis usage: Caching, distributed locks, and delayed tasks via zAdd and zRangeByScore (see code below).

MySQL read‑write split: Use

<mvn.jdbc.driver>com.mysql.jdbc.ReplicationDriver</mvn.jdbc.driver>

and

<mvn.jdbc.url>jdbc:mysql:replication://m.xxx.com:3306,s.xxx.com:3306/dbName</mvn.jdbc.url>

for master‑slave routing.

public Boolean zAdd(String key, final double score, String value);
public Set<String> zRangeByScore(String key, final double min, final double max);
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendarchitecturehigh-availabilityorder-system
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.