How JD.com Scaled Its Order System with Elasticsearch: A Journey Through Cluster Evolution
This article details how JD Daojia's order center migrated from MySQL to Elasticsearch, iteratively refined its ES cluster architecture across five stages, tackled scalability and reliability challenges, and implemented robust data synchronization and optimization techniques to support billions of documents and hundreds of millions of daily queries.
JD Daojia's order center experiences massive query traffic from both external merchants and internal systems, leading to a read‑heavy, write‑light workload that MySQL alone cannot sustain. To handle this, the team introduced Elasticsearch (ES) as the primary engine for order queries.
ES Cluster Evolution Roadmap
1. Initial Stage
The ES cluster started with default configurations on elastic cloud instances, resulting in a chaotic node layout and single‑point failures that were unacceptable for order processing.
2. Cluster Isolation Stage
Mixed‑deployment caused resource contention, so high‑resource‑consuming nodes were migrated away from the elastic cloud. Eventually the cluster was moved to dedicated high‑spec physical machines, improving stability.
3. Node Replica Tuning Stage
To maximize hardware utilization, each ES node was placed on its own physical server. Replica count was increased from 1 primary + 1 replica to 1 primary + 2 replicas, and additional machines were added, boosting throughput.
4. Primary‑Backup Cluster Stage
A standby cluster was introduced to ensure continuity when the primary cluster fails. Data is written to both clusters (dual‑write), and older orders are archived to a historical database, keeping the backup cluster size about one‑tenth of the primary.
5. Real‑Time Mutual Backup Dual‑Cluster Stage
The primary cluster was upgraded from ES 1.7 directly to 6.x via index rebuilding. During upgrade, the backup cluster temporarily served all traffic, ensuring zero downtime. Post‑upgrade, the primary stores full data (cold), while the backup holds recent hot data, both capable of failing over to each other.
Order Data Synchronization Strategies
Listen to MySQL binlog and sync to ES (asynchronous, higher latency).
Directly write to ES via its API (synchronous, lower latency).
The team chose the API‑based approach for simplicity and real‑time requirements. Errors during writes trigger a compensating task that re‑processes failed records from the database, ensuring eventual consistency.
Key Pitfalls and Solutions
1. High Real‑Time Queries Should Bypass ES
For ultra‑fresh data, queries are routed to MySQL because ES refresh intervals (≈1 s) may cause slight staleness.
2. Avoid Deep Pagination
Deep pagination (large from values) forces each shard to build huge priority queues, consuming CPU and bandwidth. Use alternatives like search‑after or scroll APIs.
3. FieldData vs. Doc Values
Sorting on ES 1.x relied on FieldData stored in JVM heap, leading to OOM and latency spikes. Switching to Doc Values (column‑oriented storage on disk) eliminated heap pressure and improved stability.
Summary
The rapid business growth of JD Daojia forced continuous evolution of the order center's ES architecture. By iteratively isolating, scaling, replicating, and introducing dual‑cluster backups, the system achieved billions of documents, hundreds of millions of daily queries, and high availability, demonstrating that there is no single "best" architecture—only the most suitable one for current needs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
