How JD Daojia Scaled Its Order System to Billion‑Scale: Architecture, Evolution, and High‑Availability Practices
This article details JD Daojia's order system architecture, tracing its evolution from a monolithic design to a micro‑service, multi‑cluster setup with Redis, MySQL, and Elasticsearch, and explains the high‑availability, disaster‑recovery, capacity‑planning, and alerting techniques that keep billions of orders running smoothly.
JD Daojia, a leading local instant‑retail platform, needed to support one‑hour delivery for tens of millions of users, prompting a continuous evolution of its order system from a single module to a distributed, high‑concurrency, high‑availability architecture.
1. JD Daojia System Architecture
Business Architecture
The system consists of C‑end users, B‑end merchants (e.g., Walmart, Yonghui), merchant picking apps, a delivery side for rider order grabbing, and settlement for both riders and merchants.
Operational Support Architecture
Five core modules support operations: merchant management, CMS, marketing, finance, and operational data, backed by caching (LocalCache, Redis), databases (MySQL, MongoDB), and Elasticsearch.
Backend Architecture
All user requests pass through a gateway, then reach business services such as home page, store page, cart, checkout, order creation, payment, and personal order management. Core services rely on foundational services like inventory, product, store, and pricing.
Order Data Ingestion Flow
After a user places an order, the order data is sharded and stored. An asynchronous pipeline distributes the order to the production system at a controlled rate, preventing spikes during large promotions.
Personal order DBs are separated from production order DBs to improve stability and match different query patterns.
2. Order System Architecture Evolution
Order Fulfillment Process
After payment, orders are completed, then dispatched to merchants via open platforms or merchant centers. Merchants pick items, handle out‑of‑stock adjustments, and the system supports single‑order and batch‑order delivery.
RPC Microservice Cluster
Microservice decomposition reduces complexity, enables independent deployment, and improves scalability. Benefits include reduced coupling, faster iteration, and easier horizontal scaling.
Lower complexity by splitting domains into separate services.
Independent deployment for faster releases.
Scalable architecture that can be expanded based on traffic.
Redis Cluster
Redis provides distributed locks, caching, ordered queues, and scheduled tasks. A dual‑cluster setup isolates core and non‑core workloads to avoid large‑key performance issues.
MySQL Cluster
Initially a single master‑slave setup caused high write pressure. Read‑write separation and later sharding into hot (recent days) and cold (historical) databases reduced load and improved latency.
Elasticsearch Cluster
ES handles complex order queries. A hot‑cold dual‑cluster design stores recent production data in the hot cluster and archives older data in the cold cluster. Data sync uses Canal + Kafka to capture binlog changes and push them asynchronously to ES.
3. Order System Stability Practices
Availability Construction
Three‑stage reliability model (pre‑, during‑, post‑incident) with four capability pillars: prevention, diagnosis, resolution, and avoidance.
Disaster Recovery
ES hot‑cold clusters, Redis isolated clusters, API degradation, and multi‑datacenter deployments ensure continuity. Critical interfaces can fall back to asynchronous processing when needed.
Capacity Planning
Capacity is evaluated via load‑testing tools, scenario‑based stress tests, and staged scaling (vertical sharding, redundancy, auto‑archiving). Monitoring covers CPU, load, network I/O, thread count, disk usage, and TCP connections.
Alerting
JD's self‑developed UMP monitors interface metrics (availability, latency, QPS, error thresholds) and application metrics (GC, heap, CPU, thread count). Alerts trigger throttling, circuit‑breakers, or manual intervention.
4. Summary
Architects must balance ROI with technical debt, continuously simplify complex problems, and adapt the stack (Redis, MySQL, ES) to meet growing scale and performance demands. Ongoing research into high‑availability design, multi‑region deployment, and automated monitoring remains essential as JD Daojia expands.
Q&A Highlights
Cluster size: ES holds ~30 billion documents (~1.3 TB) with 8 primary shards, each with 2 replicas (24 shards total).
ES best practices: Use static shard counts (powers of two), keep shard size reasonable, avoid frequent updates, and delete by index (time‑based indices).
Canal sync delay: Near‑real‑time (≈1 s) but depends on pipeline bottlenecks; not suitable for strict real‑time requirements.
Redis usage: Caching, distributed locks, and delayed tasks via zAdd and zRangeByScore (see code below).
MySQL read‑write split: Use
<mvn.jdbc.driver>com.mysql.jdbc.ReplicationDriver</mvn.jdbc.driver>and
<mvn.jdbc.url>jdbc:mysql:replication://m.xxx.com:3306,s.xxx.com:3306/dbName</mvn.jdbc.url>for master‑slave routing.
public Boolean zAdd(String key, final double score, String value);
public Set<String> zRangeByScore(String key, final double min, final double max);Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
