Operations 14 min read

How JD Daojia Scaled Its Elasticsearch Cluster to Billions of Docs: Lessons and Pitfalls

This article details JD Daojia's order center Elasticsearch architecture evolution—from a chaotic initial deployment to a real‑time dual‑cluster backup—covering scaling strategies, data synchronization methods, and the practical pitfalls encountered along the way.

21CTO

Dec 3, 2018

How JD Daojia Scaled Its Elasticsearch Cluster to Billions of Docs: Lessons and Pitfalls

Background

In JD Daojia's order center system, both external merchant order creation and internal upstream/downstream dependencies generate massive query traffic, leading to a read‑heavy, write‑light workload. Order data is stored in MySQL, but relying solely on the database for such query volume is impractical, and MySQL lacks sufficient support for complex queries, so the order center uses Elasticsearch to shoulder the main query load.

ES Cluster Architecture Evolution

1. Initial Stage

The ES cluster started as a blank slate with default configurations on elastic cloud, resulting in chaotic node deployment and single‑point failures, which were unacceptable for order‑center business.

2. Cluster Isolation Stage

Mixed‑deployment caused resource contention that occasionally disrupted the ES service. To mitigate this, high‑resource nodes were migrated away, and later the cluster was moved to high‑spec physical machines, improving performance.

3. Node Replica Optimization Stage

Deploying each ES node on a dedicated physical machine maximized resource usage. To avoid bottlenecks, the replica factor was increased from 1 primary + 1 replica to 1 primary + 2 replicas, and additional machines were added.

The architecture uses a VIP for external load balancing. The first layer consists of gateway nodes acting as ES client nodes (smart load balancers). The second layer comprises data nodes that store and process data. With one primary and two replicas, request distribution is balanced via round‑robin, increasing throughput and query performance. Performance graphs for each stage are shown below.

4. Primary‑Secondary Cluster Adjustment Stage

To ensure high availability, a standby cluster was introduced. When the primary cluster experiences a node failure, traffic can be switched to the standby, preventing order‑center disruptions.

5. Current: Real‑time Mutual Backup Dual‑Cluster Stage

The primary cluster was upgraded from ES 1.7 directly to 6.x, requiring index recreation. During the upgrade, the standby cluster temporarily took over all queries to avoid downtime. The standby now stores recent hot data (about one‑tenth of the primary’s document count) and handles the majority of query traffic, while the primary stores the full dataset for cold‑data queries and special scenarios.

ES Order Data Sync Scheme

Two main approaches are used to sync MySQL data to ES:

Solution 1: Listen to MySQL binlog and sync to ES

Advantages: Low coupling with business logic; the application does not need to handle ES writes.

Disadvantages: Requires ROW‑based binlog, introduces a new sync service, increases development and maintenance effort, and adds synchronization risk.

Solution 2: Directly write to ES via its API

Advantages: Simple and flexible control over data writes.

Disadvantages: Strong coupling with business code; relies heavily on the application’s write logic.

Given the high real‑time requirement of order data, the binlog method’s latency is unacceptable, and it adds extra system complexity. Therefore, JD Daojia adopts direct ES API writes, which are concise, flexible, and meet the synchronization needs.

When an ES write fails, the operation is not retried to avoid impacting response time. Instead, a compensating task is recorded in the database; a worker later reads these tasks and updates ES, ensuring eventual consistency between the database and ES.

Pitfalls

1. High‑real‑time queries use DB

ES refreshes shards by default every second, making it near‑real‑time but not truly real‑time. For queries requiring the absolute latest data, the system falls back to direct DB queries.

2. Avoid deep pagination

Deep pagination (large from values) forces each shard to build large priority queues, consuming CPU, memory, and bandwidth, and can lead to OOM or severe performance degradation. It should be avoided.

3. FieldData vs. Doc Values

FieldData: Used for sorting in ES 1.x, stored in JVM heap, can cause OOM or long GC pauses when the cache exceeds its threshold.

Doc Values: Column‑oriented storage placed in Lucene files, not in heap, providing stable performance; default from ES 2.x onward.

Conclusion

The rapid architectural iterations stem from JD Daojia’s fast business growth. Continuous optimization—ranging from initial chaotic deployment to a high‑throughput, highly available dual‑cluster setup—demonstrates that there is no single “best” architecture, only the most suitable one for current needs. Future upgrades will aim for even higher throughput, better performance, and stronger stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch High Availability Data synchronization Cluster Architecture

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.