Operations 13 min read

Scaling JD Daojia Order Search with Elasticsearch: Cluster Evolution Journey

JD Daojia’s order center faced massive query loads, prompting a shift from MySQL to Elasticsearch and a multi‑stage evolution of its ES cluster—from an initial loosely configured setup, through isolation, replica tuning, master‑slave adjustments, to a real‑time dual‑cluster architecture—enhancing stability, throughput, and scalability.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Scaling JD Daojia Order Search with Elasticsearch: Cluster Evolution Journey

JD Daojia's order center system experiences extremely high query volumes, making MySQL alone insufficient for order searches; therefore, Elasticsearch (ES) is adopted to handle the primary query load.

ES provides near‑real‑time storage and search, currently managing over 1 billion documents and handling around 5 billion queries daily.

ES Cluster Architecture Evolution

1. Initial Stage

The ES cluster started with default configurations on elastic cloud, with chaotic node deployment and single‑point failure risks.

2. Cluster Isolation Stage

Mixed deployment caused resource contention; high‑resource‑consuming nodes were migrated off the elastic cloud to dedicated physical machines, improving stability.

3. Node Replica Tuning Stage

Each ES node was placed on a separate physical machine to maximize resource usage. Replicas were increased from one to two, and additional machines were added, boosting throughput.

4. Master‑Slave Adjustment Stage

A standby cluster was introduced for high availability. Business double‑writes sync data to both primary and backup clusters; an archival mechanism moves older orders to a history store. ZooKeeper controls traffic switching, ensuring queries can fall back to the backup cluster when needed.

5. Current Stage: Real‑Time Dual‑Cluster

The primary ES cluster was upgraded from version 1.7 to 6.x, requiring index rebuilding. During upgrades, the backup cluster temporarily serves as the primary to avoid downtime. The backup cluster stores recent hot data (≈10% of primary size) and handles most query traffic, while the primary stores the full dataset for less frequent, full‑order searches.

ES Order Data Synchronization

Option 1: Listen to MySQL binlog and sync to ES (asynchronous, adds system complexity).

Option 2: Directly write to ES via its API (synchronous, simpler, meets real‑time needs).

The team chose the direct API approach. If a write fails, a compensating task is recorded in the database; a worker later retries the ES update to ensure eventual consistency.

Common Pitfalls

1. High Real‑Time Query Requirements

ES refreshes shards every second, so newly indexed documents may not be instantly searchable; critical real‑time queries therefore fall back to the database.

2. Avoid Deep Pagination

Large from values cause each shard to build huge priority queues, consuming CPU and bandwidth; thus deep pagination should be avoided.

3. FieldData vs. Doc Values

FieldData stores sorting data in JVM heap, leading to possible OOM and latency spikes; switching to Doc Values (column‑oriented storage on disk) mitigates this issue.

Conclusion

The rapid iteration of the ES architecture mirrors JD Daojia's fast business growth; continuous optimization—through isolation, replica tuning, dual‑cluster design, and version upgrades—has significantly improved throughput, performance, and stability, though the optimal solution remains context‑dependent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationElasticsearchdata synchronizationCluster Architecture
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.