How JD.com Scaled Its Order Search with Elasticsearch: Lessons from a Billion-Document Cluster
This article details JD.com’s journey of evolving the Elasticsearch cluster for its order system—from a chaotic initial deployment to a real‑time dual‑cluster architecture—highlighting scaling strategies, data sync methods, and the performance pitfalls they overcame.
ES Cluster Architecture Evolution
In JD Daojia’s order center, both external merchant order creation and internal system dependencies generate massive query traffic, leading to a read‑heavy workload on MySQL. To avoid overloading MySQL and to support complex queries, the system adopted Elasticsearch as the primary query engine.
Elasticsearch now stores over 1 billion documents and handles 500 million queries per day. As business grew, the ES architecture evolved through several stages.
1. Initial Stage
The cluster was initially deployed on elastic cloud with default settings, resulting in a single‑point‑of‑failure design and a chaotic node layout.
2. Cluster Isolation Stage
Mixed‑tenant deployments caused resource contention. High‑resource‑consuming nodes were moved off the elastic cloud, and eventually the cluster was migrated to dedicated high‑spec physical machines for better stability.
3. Node Replica Optimization
To maximize hardware utilization, each ES node was placed on its own physical machine. The replica factor was increased from 1 primary + 1 replica to 1 primary + 2 replicas, and additional machines were added to boost throughput.
4. Primary‑Secondary Cluster Adjustment
A standby cluster was introduced to ensure continuity during primary failures. Data is written synchronously to the primary and asynchronously to the standby. Historical orders are archived, and the standby stores only recent hot data, while the primary holds the full dataset.
5. Current Real‑Time Dual‑Cluster Stage
The primary cluster was upgraded from ES 1.7 to ES 6.x, requiring index recreation. During upgrades, the standby temporarily assumes the primary role to avoid service disruption. The standby now handles hot‑data queries, while the primary serves cold‑data and full‑search scenarios.
ES Order Data Synchronization
Two synchronization approaches were considered:
Listening to MySQL binlog and pushing changes to ES (asynchronous, higher latency).
Directly writing to ES via its API (synchronous, lower latency).
The API‑based method was chosen for its simplicity and real‑time characteristics. A compensation mechanism records failed writes in MySQL; a worker later retries to ensure eventual consistency.
Common Pitfalls
1. High‑Latency Queries Should Use DB
Because ES refreshes shards every second, near‑real‑time visibility may not meet strict latency requirements; critical queries are therefore routed to MySQL.
2. Avoid Deep Pagination
Deep pagination (large from values) forces each shard to build large priority queues, consuming CPU and memory; it should be avoided.
3. FieldData vs. Doc Values
Sorting on older ES versions used FieldData, which occupies JVM heap and can cause OOM. Switching to Doc Values stores sorting data on disk, improving stability.
Summary
The rapid iteration of the architecture mirrors JD Daojia’s fast business growth. Continuous optimization—moving from a single cluster to a real‑time dual‑cluster, adjusting replica counts, upgrading versions, and refining sync mechanisms—has dramatically increased throughput, performance, and stability, and the journey will continue as data volumes expand.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
