How OxygenREC Marries Fast and Slow Thinking to Revolutionize E‑commerce Recommendations

OxygenREC presents a fast‑slow thinking, instruction‑following generative framework that overcomes latency, reasoning, and multi‑scene scalability challenges in e‑commerce recommendation, delivering unified training, low‑latency inference, and significant business impact across JD.com scenarios.

JD Tech
JD Tech
JD Tech
How OxygenREC Marries Fast and Slow Thinking to Revolutionize E‑commerce Recommendations

Problem Statement

Industrial e‑commerce recommender systems face three intertwined challenges:

Limited deductive reasoning: Traditional cascade models rely on pattern extraction from massive behavior logs and cannot incorporate world knowledge to infer latent user needs (e.g., recommending infant sweat‑proof pajamas for a young mother in cold, humid Chengdu winter).

Multi‑scene scalability vs. resource efficiency: Deploying separate generative models for homepage, channel feed, cart, search, etc., leads to high training and serving costs, while a single unified model often suffers from negative transfer across heterogeneous scenes.

Industrial‑grade latency and engineering constraints: Combining a billion‑parameter LLM (for deep reasoning) with TB‑scale sparse embedding tables (for user/item features) while meeting strict online latency budgets requires a carefully engineered training and inference stack.

Fast‑Slow Thinking Architecture

Slow Thinking (offline LLM pipeline)

A near‑line large language model processes user spatiotemporal context, personalized attributes, and long‑term behavior to generate a high‑quality context‑reasoning instruction . This instruction encodes world knowledge and deep deductive reasoning but is produced in batch, so it adds no online latency.

Fast Thinking (online encoder‑decoder)

An efficient Transformer encoder‑decoder backbone receives the pre‑generated instruction together with real‑time signals (e.g., current page, recent clicks) and outputs a ranked recommendation sequence within the millisecond‑level latency SLA.

Instruction Control Mechanism

Query‑to‑Item (Q2I) alignment loss: During training an auxiliary loss forces the instruction embedding and the target item embedding into a shared semantic space. Formally, for a training pair (instruction i, item j) the loss L_Q2I = ||E_i - V_j||_2^2 aligns them, enabling the instruction to act as a query vector for item retrieval.

Instruction‑Guided Retrieval (IGR): At inference the aligned instruction vector queries the user’s long‑term behavior memory, retrieving the most relevant historical items and discarding noisy ones. This focuses the fast‑thinking model on behavior that matches the instruction intent, improving controllability and relevance.

Unified Multi‑Scene Alignment

Scene instructions: Each recommendation scenario (homepage, search, cart, checkout, etc.) is encoded as a scene instruction that concatenates a scene identifier and optional trigger‑item IDs. The same model conditions on any scene by simply swapping this instruction.

Reward‑mapping service: Business objectives (GMV, conversion rate, legality, diversity) are normalized into a unified scalar reward via a learned mapping function, allowing a single reinforcement‑learning objective across scenes.

Soft Adaptive Group Clip Policy Optimization (SA‑GCPO): Replaces the hard clipping of traditional GRPO with an adaptive gating function g(a) = sigmoid(α·a + β) that scales the advantage term per‑group. This stabilizes multi‑task, multi‑scene policy updates and improves HR@1/HR@10 when synthetic data ratios vary.

Production System

Training framework: Built on PyTorch, it integrates a sparse embedding engine for TB‑scale categorical features and a dense LLM engine for the slow‑thinking pipeline. Deployed on a 128‑GPU H800 cluster, the system achieves ~40 % FLOPs utilization.

Inference engine xLLM : Three orthogonal optimizations enable low‑latency generation for long contexts and large candidate pools: xSchedule – system‑level request scheduling. xAttention – custom kernel for fused attention on mixed sparse/dense inputs. xBeam – efficient beam‑search with early‑stop heuristics.

Near‑line instruction service: Instructions are batch‑generated offline, stored in a key‑value store, and fetched by the online model at request time, eliminating any online LLM call.

Experimental Evaluation

Offline Metrics

Hierarchical semantic IDs built via multi‑modal contrastive learning achieve 92.8 % category purity and near‑zero ID collisions.

Ablation studies show that inserting the instruction after the BOS token, combining scene‑ID and trigger‑item ID, and jointly applying IGR + Q2I alignment each yield measurable HR improvements.

Online A/B Tests (six JD.com core scenes)

Homepage: GMV uplift of 4.52 %–8.40 %.

Channel feed: Order volume increase up to 8.03 %.

Checkout (high purchase intent): GMV uplift up to 11.80 %.

SA‑GCPO Stability

When the proportion of synthetic data varies (e.g., 33 % synthetic), SA‑GCPO consistently outperforms GRPO and its variants on HR@1 and HR@10, demonstrating robust multi‑task learning.

Future Directions

Non‑autoregressive generation: Explore diffusion‑based or parallel decoding models to break the linear latency‑list‑length trade‑off and increase throughput.

Cross‑scene user trajectory modeling: Jointly model user behavior across homepage, search, cart, and checkout to capture long‑term intent and enable value‑oriented recommendation over extended sessions.

Key Visuals

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e-commerceRecommendationLLMmulti-sceneGenerative AI
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.