Artificial Intelligence 23 min read

How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink

This article explains JD's complex recommendation system data pipeline—from indexing, sampling, and feature engineering to explainability and real‑time metrics—highlighting challenges such as data consistency, latency, and the use of Flink for massive, low‑latency processing.

JD Retail Technology
JD Retail Technology
JD Retail Technology
How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink

JD's recommendation system relies on a sophisticated data architecture that supports recall, model, strategy, and effectiveness evaluation, requiring massive data processing capabilities. Real‑time and offline data inconsistencies, warehouse model deviations, and mismatched calculation standards can degrade recommendation performance.

Recommendation System Architecture

The system provides various recommendation scenarios (personalized, hot, new, diversified) and consists of three key modules: recall, model (coarse, fine, re‑ranking), and strategy. Recall reduces the candidate pool from billions to tens of millions.

Index

Indexes support recall and are divided into personalized, basic, and strategy types, each containing forward and inverted data structures. Index architecture includes real‑time, incremental, and full indexes, which complement each other to ensure stability. Index construction starts from Kafka streams, performs parsing, deduplication, attribute enrichment, and writes back to Kafka for downstream services. Incremental indexes update hourly or minutely, while full indexes update daily, weekly, or monthly.

Sample

Samples are built via streaming or batch pipelines. Streaming samples join user behavior (exposure, click) with features in near‑real‑time windows (5 min, 10 min, 20 min) to generate incremental samples for model training. Batch samples concatenate offline behavior tables with feature tables to create daily or monthly samples for full‑model training. Issues such as cold‑start, feature backtracking, delayed feedback, and sample mixing are addressed.

Feature

Feature development follows a three‑layer trigger‑flow architecture: behavior ingestion, behavior completion, and feature mining. It extracts user actions, enriches them (e.g., three‑day click aggregation, multi‑year order data), and computes statistical and sequence features for users and items. Cross‑features between user attributes and item categories/brands are also generated. Consistency between online and offline features is ensured by a shared C++ SDK, while feature leakage is mitigated via a Feature Dump mechanism that snapshots features at request time.

Explainability

Explainability covers ranking, model, and traffic aspects. Ranking explainability records the full trace from recall through filtering to final ranking, capturing feature inputs at each stage. Model explainability focuses on feature importance for SKU scores. Traffic explainability analyzes macro‑level flow, such as why certain items receive more exposure, using ClickHouse for multi‑dimensional queries and Flink for real‑time ETL.

Metrics

Flink streams event data into OLAP systems to compute real‑time metrics (UCTR, UCVR, GMV, order volume, liquidity) across experiment, brand, and category dimensions. When OLAP joins become a bottleneck, Flink supplements metric calculations to maintain performance.

The presentation concludes that inconsistencies in index, sample, or feature stages can cause recommendation anomalies, and JD addresses these through offline consistency checks, explainable systems, and robust data pipelines.

Big Dataflinkfeature engineeringrecommendation systemreal-time dataexplainability
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.