Big Data 15 min read

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

This article details JD Retail's strategic "Nirvana" product‑selection platform, describing the technical challenges of handling billions of items and hundreds of tags, and presenting a dual‑engine solution using ClickHouse and Elasticsearch with Spark‑driven data pipelines to achieve fast filtering, multidimensional analytics, and efficient storage.

JD Cloud Developers

Dec 15, 2021

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

Background Introduction

"Nirvana" selection is a strategic big‑boss project within JD Retail that aims to build underlying product capabilities, streamline proposal and placement processes, and achieve online, rule‑based, and intelligent product selection through multi‑party collaboration across marketing, category, and operations.

The diverse business requirements created numerous technical difficulties at the project's outset.

Problem Solving for R&D

To address the technical challenges, JD Retail designed a comprehensive solution.

Specific Technical Solution

The solution is divided into three major modules:

Module 1: ClickHouse and Elasticsearch storage structure design.

Module 2: ClickHouse data push and validation.

Module 3: Elasticsearch data push and validation.

The core problems addressed are: (1) reconciling fast filtering with fast multidimensional statistical queries, (2) improving the import efficiency of massive product feature data, and (3) reducing storage consumption of massive feature data.

The proposed method combines Elasticsearch with ClickHouse, introduces snapshot tables, and uses Spark for offline data import and validation, thereby drastically lowering storage usage while supporting both fast filtering and multidimensional analytics.

1. ClickHouse & Elasticsearch Storage Design

a) Dual‑engine storage ensures quick filtering via Elasticsearch and fast multidimensional statistics via ClickHouse, with consistency guaranteed by validation during data import.

b) Real‑time queries for new tasks use daily tables in ClickHouse and daily indices in Elasticsearch, both cleared after validation.

c) Historical queries use snapshot tables created nightly, storing immutable snapshots for each task, enabling efficient query performance.

d) Real‑time secondary selection leverages Elasticsearch parent‑child documents to store a small number of real‑time tags in child documents, improving update efficiency.

2. ClickHouse Data Push & Validation

Implementation steps:

Generate a wide feature table by merging product, user, and traffic data.

Use Spark to transform data types, fill nulls, and handle complex structures for ClickHouse.

Detect or create ClickHouse tables with ReplicatedReplacingMergeTree engine, ensuring schema consistency.

Distribute data across shards based on primary‑key hash, repartition, and write via JDBC batch inserts.

Validate each category’s row count between ClickHouse and the source, handling three cases (match, ClickHouse excess, ClickHouse deficit) with optimize or re‑push operations.

The same approach applies to snapshot and secondary‑selection data, differing only in cleanup strategy.

3. Elasticsearch Data Push & Validation

Implementation steps:

Reuse the wide feature table from the ClickHouse process.

Transform data types for Elasticsearch, handling nulls and complex structures; restructure for parent‑child documents when needed.

Statistically allocate categories to a set of indices to balance data volume.

Bucket data per target index and bulk‑load using Elasticsearch BulkProcessor, using the primary key as _id.

Validate index document counts per category against the source, re‑importing if counts are lower.

The method also supports snapshot and secondary‑selection data with different cleanup policies.

Implementation Process

Test Conclusions

Currently, over 1 billion product feature records with 490 tags are processed. Daily offline import to ClickHouse takes 40 minutes (40 shards), an 80 % reduction versus previous methods. Elasticsearch import takes 2 hours, a 60 % reduction. The online search platform handles up to 300 QPS with sub‑millisecond tp99, while storage usage drops by 60‑70 % compared to conventional approaches.

Future Outlook

The solution has supported multiple scenarios and business units, enabling rule‑based, online, and intelligent product selection. However, challenges remain:

Dual‑engine consistency still relies on daily validation, affecting update latency.

Real‑time tags are scarce; expanding real‑time tagging will introduce new consistency and architectural challenges.

-End-

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Engineering Big Data Elasticsearch ClickHouse product selection Spark

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.