How JD.com Scaled POP Order Elasticsearch to Handle Billions of Orders
This article analyzes the challenges of JD.com's POP order Elasticsearch storage—including data skew, oversized shards, frequent updates, and high maintenance costs—and details the multi‑layered architectural redesign that introduced tenant isolation, dual‑hash routing, differentiated shard strategies, and a dual‑active physical foundation to achieve high performance, scalability, and availability.
Introduction
With JD.com’s rapid business growth, the number of platform merchants and order volume surged dramatically, especially after the 2025 push into local life services. The POP order Elasticsearch (ES) storage faced severe pressure, prompting an urgent system architecture upgrade.
Current System Overview
The POP order ES system originally served merchant fulfillment queries (non‑C‑end). It consisted of write services consuming various upstream messages (order pipelines, binlog, OFW changes, reconciliation messages) to update ES, and read services providing merchant‑level order lists and details.
Initially, only paid, fulfillable orders were indexed, but over time additional order types (pre‑sale, split, etc.) and un‑paid orders were added, increasing data diversity. Open API demands grew, requiring fewer interfaces to retrieve more data, leading to a heterogeneous system that also incorporated invoice, promise, after‑sale, and rating statuses, further stressing ES storage.
Core Pain Points
Data skew : Merchant‑level routing caused a few large merchants (e.g., JD Xi Self‑Operated) to occupy up to 25% of total order volume, resulting in TB‑scale shards and performance volatility.
Growing data volume : Shard sizes exceeded ES’s recommended 50‑100 GB by up to 50×, with some shards over 1 TB.
Increasing update frequency : Up to 300 k updates per minute, causing update conflicts and latency.
High maintenance cost : Bi‑annual migration of >6‑month‑old data from hot to cold clusters required complex DUCC changes, data comparison, and gray‑scale traffic switching.
Solution Overview
The redesign combines tenant‑level isolation, dual‑hash routing, differentiated shard strategies, and dual‑active physical infrastructure to build a high‑performance, highly‑scalable, and highly‑available order retrieval platform.
3.1 Resolving Data Skew
Physical isolation : Deploy dedicated clusters for large merchants, creating separate indices with shard counts tuned to their volume, and add virtual routing logic to direct large‑merchant traffic to these clusters.
Flexible routing strategy : Extend the proxy layer to support order‑level routing (instead of merchant‑level) for large‑merchant clusters, ensuring even data distribution across shards.
3.2 Addressing Oversized Shards
ES recommends single shard sizes of 50‑100 GB and a maximum of 100 nodes per cluster. To stay within these limits, the hot cluster’s ordinary merchant data was expanded from one to three ES clusters, with the proxy layer hashing merchant IDs to distribute data evenly.
3.3 Reducing Frequent Update Pressure
The system originally used ES’s optimistic lock update flow (read version → set fields → save). To lower update load, a buffering layer aggregates messages before applying them, decreasing the number of ES writes and mitigating conflicts.
3.4 Automating Data Maintenance
Two‑stage migration was implemented:
Stage 1: Use reindex to move data, then replay archival messages to catch up, cutting migration time from ~20 days to ~10 days.
Stage 2: Create a daily archival ES index as a cache, store daily order changes, and run a scheduled task to batch read, compare, write, and delete original data, achieving fully automated migration without manual intervention.
3.5 Final Architecture
The final solution delivers a high‑performance, highly‑scalable, and highly‑available enterprise‑grade order search and analysis platform, fully supporting explosive business growth.
Appendix
4.1 Challenges of Maintaining Ultra‑Large Clusters
Elasticsearch’s peer‑to‑peer architecture requires every node to store the full cluster state. When node count exceeds 100, broadcasting the cluster state leads to network storms, slow convergence, and increased risk of Full GC pauses.
4.2 ES Update Mechanics
Updates follow a “delete + insert” model:
Read : Retrieve the old document (or full source for partial updates).
Soft delete : Mark the old document as deleted in the segment.
Insert : Write the new document as a fresh entry with a new version.
This process incurs significant CPU for re‑analysis and indexing, not just I/O.
4.3 Pressures from Frequent Updates
Disk I/O and segment merge storms : Each update creates a small segment; merges generate heavy I/O and can throttle writes.
Query performance degradation : Deleted documents (“tombstones”) remain until merged, increasing scan overhead and memory usage.
Cache invalidation : Frequent segment changes invalidate filter caches, leading to low cache hit rates.
GC pressure : Update operations generate many temporary objects, causing frequent Young GC and risking Full GC pauses.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
