Backend Development 12 min read

How JD.com Scaled Its Order Search with Elasticsearch: From Chaos to Real‑Time Dual Clusters

This article details how JD.com’s order center migrated from a MySQL‑only design to a high‑performance Elasticsearch cluster, evolving through isolation, replica tuning, master‑slave adjustments, and real‑time dual‑cluster architecture to achieve billions of documents, hundreds of millions of daily queries, and robust fault tolerance.

21CTO

Jul 29, 2020

How JD.com Scaled Its Order Search with Elasticsearch: From Chaos to Real‑Time Dual Clusters

In JD.com’s order‑to‑home business, both external merchant orders and internal system dependencies generate massive query traffic, creating a read‑heavy, write‑light workload that MySQL alone could not sustain.

The team initially stored order data in MySQL but quickly adopted Elasticsearch (ES) to offload the majority of query load, as ES provides near‑real‑time search capabilities.

Today the ES cluster holds over 1 billion documents and handles about 500 million queries per day. As the business grew, the ES architecture evolved through several stages:

1. Initial Stage

The cluster was first deployed on elastic cloud with default settings, leading to single‑point failures and resource contention.

2. Isolation Stage

To avoid resource competition, the ES nodes were moved from shared cloud instances to dedicated high‑spec physical machines, improving stability and performance.

3. Replica Optimization Stage

Each node was placed on its own physical server, and the replica factor was increased from 1 primary + 1 replica to 1 primary + 2 replicas, adding more machines to boost throughput.

4. Master‑Slave Adjustment Stage

To guarantee high availability, a standby cluster was introduced. Data is written synchronously to the primary cluster and asynchronously to the standby. When the primary fails, the standby takes over, and a ZooKeeper‑controlled switch directs traffic.

5. Real‑Time Dual‑Cluster Stage

The primary cluster was upgraded from ES 1.7 to ES 6.x, requiring index recreation. During the upgrade, the standby cluster served all queries, ensuring zero downtime. The standby now stores hot recent data (≈10 % of total docs) and handles the majority of query traffic, while the primary stores the full cold dataset.

Data Synchronization

Two approaches were considered: listening to MySQL binlog or using the ES API directly. JD.com chose the API method for lower latency and simpler maintenance, writing orders to ES in real time.

When write failures occur, a compensating task is recorded in the database; a worker later re‑processes these entries to ensure eventual consistency.

Common Pitfalls

High‑real‑time queries should bypass ES and hit the database directly. ES refreshes every second, so newly indexed documents may not be immediately searchable.

Avoid deep pagination. Large from values cause each shard to build huge priority queues, consuming CPU and memory.

FieldData vs. Doc Values. Sorting on older ES versions used FieldData, which occupies JVM heap and can cause OOM; newer versions default to Doc Values, storing data on disk and improving stability.

Conclusion

The rapid iteration of the order‑center architecture mirrors JD.com’s fast‑growing business. Continuous upgrades—isolating resources, expanding replicas, introducing dual clusters, and migrating to newer ES versions—have dramatically increased throughput, performance, and reliability, illustrating that there is no “best” architecture, only the one that best fits current needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Engineering Elasticsearch High Availability Data synchronization Search Architecture

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.