Backend Development 12 min read

How JD.com Scaled Its Order System with Elasticsearch: Architecture Evolution

This article details how JD.com's order center migrated from MySQL‑only reads to a high‑throughput Elasticsearch cluster, describing each architectural phase—from the initial bare‑metal setup, through isolation, replica tuning, primary‑secondary adjustments, to the current real‑time dual‑cluster—while sharing synchronization strategies and performance pitfalls.

Programmer DD

Sep 13, 2020

How JD.com Scaled Its Order System with Elasticsearch: Architecture Evolution

JD.com’s order center stores order data in MySQL, but the massive query volume makes a pure DB solution impractical, so Elasticsearch (ES) is used to handle the majority of order queries.

ES Cluster Architecture Evolution

1. Initial Stage

The ES cluster started with default configurations on elastic cloud instances, resulting in a chaotic node layout and single‑point failures that were unacceptable for order processing.

2. Cluster Isolation Stage

Mixed‑deployment caused resource contention; high‑resource nodes were moved to dedicated physical machines, improving stability and performance.

3. Node Replica Optimization

Each ES node was placed on its own physical server to avoid intra‑node resource competition, and the replica factor was increased from 1 primary + 1 replica to 1 primary + 2 replicas, boosting throughput.

4. Primary‑Secondary Cluster Adjustment

A standby cluster was introduced for failover. Data is written to the primary cluster synchronously and to the standby asynchronously; during primary upgrades, the standby takes over to ensure zero‑downtime service.

5. Current Real‑time Dual‑Cluster Stage

The primary cluster was upgraded from ES 1.7 to 6.x, and both clusters now operate with a dual‑write strategy, where the standby holds recent hot data (≈10% of total) and the primary stores the full dataset, effectively separating hot and cold workloads.

ES Order Data Synchronization

Two approaches were considered: listening to MySQL binlog or directly using the ES API. JD.com chose the API method for its simplicity and low latency, writing orders to ES in real time.

When a write fails, a compensating task is created; a worker later reprocesses these tasks to ensure eventual consistency between MySQL and ES.

Pitfalls Encountered

1. High Real‑time Requirements

Because ES refreshes shards every second, queries needing absolute real‑time data are routed directly to MySQL.

2. Avoid Deep Pagination

Deep pagination (large from values) forces each shard to generate huge result sets, consuming CPU and memory; thus it should be avoided.

3. FieldData vs. Doc Values

FieldData, stored in JVM heap, caused OOM and slowdowns during sorting; switching to Doc Values (column‑oriented storage on disk) resolved the issue.

Conclusion

The rapid iteration of the order center’s architecture was driven by business growth; continuous optimization of ES clusters—through isolation, replica scaling, dual‑cluster deployment, and version upgrades—has significantly improved throughput, performance, and stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch High Availability Data synchronization cluster scaling order system

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.