Backend Development 14 min read

Design and Architecture of JD Retail Product Selection Platform

This article details the design and implementation of JD Retail’s product selection platform, covering its business background, core data retrieval capabilities, domain model, system architecture—including frontend configurability, backend query engine, ClickHouse indexing, and both offline and real-time data processing pipelines.

JD Retail Technology

Jul 19, 2022

Design and Architecture of JD Retail Product Selection Platform

Background

With the rapid development of the internet and changing consumer shopping habits, e‑commerce’s share of the retail market has been increasing year by year. The scale of products on JD Retail has grown to tens of billions, creating a massive catalog that is both an opportunity and an operational challenge.

Core Challenge

The key difficulty is how to identify high‑quality, operation‑relevant products from this massive pool.

Platform Initiative

Our team participated in building JD Retail’s product selection platform from scratch, and during the 2022 618 promotion we covered 50% of the promotional floor’s product selection responsibilities.

Purpose of This Article

This article shares technical experiences accumulated from Q3 2021 to the 2022 618 event, aiming to help industry peers.

Selection Core Capability Breakdown

The most essential function of a selection system is to help operators pick target data from a large dataset, which translates to a data retrieval service. The three core elements are data, filtering, and sorting . A comparative table of everyday tools is shown below:

JD Retail Selection Business Situation

As Steve Jobs once said, “We should find the right technology for a good product, not design a product to promote technology.” Similarly, we first investigated the business scenarios and technical requirements.

Key business observations:

Different selection scenarios use data ranging from millions to billions, with varying metrics.

JD Retail’s business is complex, requiring support for many use cases.

Selection results must be delivered in real time, including lists and distribution data.

System requirements derived from these observations:

Support real‑time queries over large data volumes.

Handle diverse selection data with flexible configuration per business.

Support both OLTP and OLAP queries.

Selection Domain Model Design

The domain model consists of two parts: input (data, filter, sort) and output (view capabilities and data export). The model diagram is shown below:

Selection System Architecture

After defining the domain capabilities, we designed an architecture that provides a unified backend to support many varied front‑end scenarios.

01 Frontend

The frontend is the user‑facing layer. We designed three core work areas: filter rule configuration, sort rule configuration, and display (product list and distribution). Rendering is driven by JSON‑Schema‑based configuration using JD Retail’s open‑source Waterdrop components (drip‑form and drip‑table).

This configuration‑driven approach enables rapid construction of diverse front‑end selection interfaces on a common foundation.

02 Backend

The backend’s application layer focuses on two core functions: a protocol dictionary that maps front‑end configuration to logical operators (e.g., >, <, =), and a query parsing engine that transforms these logical expressions into executable query statements.

For data storage we selected ClickHouse, which excels at OLAP queries and high‑throughput data ingestion.

Large‑scale data preprocessing is performed offline using Hive SQL or Spark, following a standardized pipeline: pool initialization → metric addition → offline derivation → data cleaning. This pipeline is parameterized per business scenario.

We also implemented a quality‑monitoring system covering service‑chain tracing, scheduled inspections, throttling, and abnormal data handling.

Selection Data Index Architecture

Given the billion‑scale data, we adopted a distributed indexing engine (ClickHouse) with a wide‑table design that merges product primary keys and business metrics, enabling single‑table queries without costly joins.

Data processing is split into offline (hour‑ or day‑level latency) and real‑time (near‑real‑time) pipelines. Offline processing uses custom hash partitioning to write data directly to index shards. Real‑time processing follows a two‑stage approach: fast ingestion of raw streams followed by a merge stage that normalizes diverse formats into the target schema.

The overall architecture ensures efficient real‑time queries through the wide‑table index and maintains data freshness via both offline and real‑time pipelines.

Conclusion

The first version of the JD Retail selection platform launched in Q3 2021. Its architecture, covering system engineering and data processing, presented both challenges and opportunities for the team. While many achievements were made, the encountered problems also provided valuable learning experiences, enhancing our technical depth and breadth.

We invite readers to share feedback and discuss whether this architecture can be adapted to other scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce system architecture Big Data product selection data indexing

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.