Generative Recommendation for CPS Advertising: Intent Sensing, Multi‑Objective Optimization, and the One4All Framework
This article surveys recent advances in generative recommendation for CPS advertising, detailing explicit intent‑aware controllable product recommendation, multi‑objective optimization techniques based on reward‑in‑context and DPO, and the scalable One4All framework that unifies behavior and language modeling across diverse ad scenarios.
1. Introduction After large language models (LLMs) achieved breakthroughs in natural language processing, the research community began exploring how generative models can enhance search‑recommendation systems. Two main directions exist: (1) using LLMs for data/knowledge augmentation without modifying the model, and (2) directly modifying LLMs to ingest massive collaborative signals from search‑recommendation data. The second direction is identified as a frontier for CPS advertising.
2. Business Requirements & Core Technical Points CPS advertising aims at multi‑scenario recommendation for off‑site users, requiring precise user‑intent perception, multi‑objective optimization (balancing revenue and activity), and compatibility with diverse tasks. Three core technologies are highlighted: controllable product recommendation via explicit intent sensing, multi‑objective optimization of recommendation effectiveness, and the One4All generative recommendation framework.
3. Controllable Product Recommendation with Explicit Intent Sensing
The problem is framed as Trigger‑Induced Recommendation (TIR). Existing solutions fall into three categories: implicit intent modeling from historical behavior, trigger‑item based I2I recall or sku‑to‑query generation, and three‑network fusion architectures (e.g., DIHN, DIAN, DEI2N, DUIN). Limitations include lack of explicit intent representation and poor controllability.
现有方案汇总
Proposed solution generates rich intent descriptions automatically from traditional recommendation data, reformulates the task as "intent text + historical item semantic IDs → target item semantic ID", and applies instruction‑following fine‑tuning to enable dynamic intent control.
Automation pipeline: (1) input user behavior + target item; (2) Few‑shot prompting with CoT using the Yanxi‑81B model to summarize, reason, and predict intent; (3) output a triple (summary‑reason‑prediction); (4) self‑verification evaluates generated intents.
自动化意图生成和评估
Input‑output paradigm: data are organized as "Input: [Prompt]. Output: [Response]" with three new task types, illustrated in the accompanying diagram.
4. Multi‑Objective Optimization of Recommendation Effectiveness
Business need: jointly optimize click‑through rate, purchase conversion, price, and commission. Existing non‑LLM methods include Shared Bottom, MMOE, PLE, ESMM; LLM‑based methods include MORLHF, MODPO, Reward Soups.
业务需求&现有方案局限性
Proposed DPO‑based preference alignment models click‑to‑purchase behavior (f) using positive (click & purchase) and negative (click & no purchase) examples, formatted as "Input: [Prompt]. Output1: [Response+]. Output2: [Response-]". Limitations: only relative ranking of f is considered, no absolute reward values.
RiC (Rewards‑in‑Context) framework integrates multiple rewards (click, purchase, price, commission) into the input as " r1 r2 …", enabling supervised fine‑tuning that learns policies under diverse reward combinations. Offline training fuses rewards; online training augments data near the Pareto front; inference maps preferences to rewards for flexible user‑centric optimization.
基于RiC (Rewards-in-Context) 的偏好对齐算法
Experimental results show offline HitRate@1 improvements of >10% across datasets and online gains of 1.5%–7% in SKUCTR, SKUCVR, as well as significant lifts in same‑store orders and commissions.
5. One4All Generative Recommendation Framework
To handle the heterogeneous CPS advertising scenarios, the One4All framework provides an extensible architecture that jointly models behavior and semantics, improving generalization. It also introduces an online model‑update strategy that ensures real‑time adaptability.
可扩展框架设计
The framework supports cross‑scene adaptation, unified retrieval‑ranking, joint search‑recommendation modeling, behavior summarization, and personalized intent inference.
6. Summary and Future Outlook
Future directions include interactive recommendation (search‑recommendation joint), deeper multimodal understanding of images and videos in the front‑end pipeline, and further exploitation of generative models for interactive applications.
References (selected): Xu et al., 2024; Zhai et al., 2024; Chen et al., 2024; Zhang et al., 2024; Deng et al., 2025; Ma et al., 2024; Zhou et al., 2024; Li et al., 2020; Rame et al., 2023; Rafailov et al., 2023; Wu et al., 2025; Lin et al., 2019; Hu et al., 2023; Yang et al., 2024.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.