One4All: A Scalable Multi‑Task Generative Recommendation Framework for CPS Advertising
The paper introduces One4All, a scalable multi‑task generative recommendation framework for CPS advertising that combines few‑shot intent prompting, a Rewards‑in‑Context multi‑objective optimization, and an online model‑selection strategy, delivering 2‑3× offline HitRate/NDCG gains and notable online CTR, CVR, and commission improvements.
After the remarkable achievements of large language models (LLMs) in natural language processing, the research community has been actively exploring how generative models can enhance or improve search‑advertising recommendation systems. Existing work can be roughly divided into two categories:
(1) Using LLMs for data and knowledge augmentation, representation extraction, or converting recommendation into a dialogue‑driven task. This approach does not modify the LLM itself and therefore cannot directly model massive collaborative signals.
(2) Modifying LLMs to directly model the massive collaborative signals in search‑advertising data by changing the input‑output paradigm and employing pre‑training or fine‑tuning. This is the prerequisite for scaling generative recommendation models and requires more complex engineering support. The second category is a frontier direction for search‑advertising recommendation.
Since 2024, industry has made progress in the second category, e.g., GR (Meta), HLLM (ByteDance), NoteLLM (Xiaohongshu), NoteLLM‑2 (Xiaohongshu), OneRec (Kuaishou). The CPS algorithm team has also conducted a series of works on generative recommendation, which are summarized in the author’s previous article "Generative Recommendation System and JD Alliance Advertising – Review and Application".
The current article refines business requirements and extracts core technical points, focusing on the characteristics of CPS advertising. It proposes a scalable multi‑task, multi‑scenario One4All generative recommendation framework.
Business Requirements & Core Technical Points
CPS advertising recommendation mainly targets off‑site users across multiple scenarios. The business requirements include precise user‑intent perception, multi‑objective optimization to balance revenue and user activity, and compatibility with diverse scenarios and tasks. Three core technologies are highlighted:
Explicit intent perception for controllable product recommendation.
Multi‑objective optimization of recommendation effectiveness.
The One4All generative recommendation framework.
Explicit Intent Perception
Traditional solutions for trigger‑induced recommendation (TIR) fall into three categories: implicit intent modeling from historical behavior, trigger‑based item‑to‑item recall or SKU‑to‑query generation, and three‑network architectures (e.g., DIHN, DIAN, DEI2N, DUIN). Their limitations motivate a controllable approach that automatically generates rich intent descriptions using few‑shot prompting and chain‑of‑thought (CoT) strategies with the Yanxi‑81B model, followed by self‑verification.
Multi‑Objective Optimization
Existing non‑LLM methods (Shared Bottom, MMOE, PLE, ESMM) and LLM‑based methods (MORLHF, MODPO, Reward Soups) are surveyed. The proposed solution integrates behavior and price data, designs reward signals for click, purchase, price, and commission, and employs a Rewards‑in‑Context (RiC) framework. Offline training incorporates multiple rewards into supervised fine‑tuning, while online training uses Pareto‑frontier‑enhanced data to alleviate sparsity. Inference maps user preferences to rewards, enabling flexible adaptation.
One4All Generative Recommendation Framework
The framework is designed to be extensible across scenarios, jointly modeling behavior and semantics. It also includes an online model‑update strategy that selects the best model based on CVR and CTR thresholds. The pseudo‑code for the update logic is:
Model_DPO_T和Model_SFT_T进行比较,选出优胜模型A
if cvr+且ctr降低小于10%: Model_A = Model_DPO_T
else: Model_A = Model_SFT_T
Model_BEST_T-1和Model_A进行比较,选出优胜模型Model_BEST_T上线
if cvr+且ctr降低小于10%: Model_BEST_T = Model_A
else: Model_BEST_T = Model_BEST_T-1Experimental results show that the intent‑aware controllable model improves HitRate and NDCG by 2‑3× offline, and online metrics such as SKUCTR (+3%), SKUCVR (+7%), and same‑store orders/commissions also see significant gains. The RiC‑based multi‑objective alignment further boosts HitRate@1 by over 10% across datasets.
Summary & Future Outlook
Future directions include interactive recommendation systems that jointly model search and recommendation, and multimodal understanding to leverage rich image and video signals in the front‑end pipeline.
References
1. Xu L, et al., Prompting large language models for recommender systems, arXiv:2401.04997, 2024. 2. ... (remaining references omitted for brevity)
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.