Artificial Intelligence 19 min read

How Generative LLMs Are Transforming CPS Advertising Recommendations

Since large language models have excelled in NLP, researchers are now enhancing CPS advertising recommendation systems by integrating generative LLMs for explicit intent perception, multi‑objective optimization, and a unified One4All framework, achieving significant offline and online performance gains across click‑through, conversion, and revenue metrics.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How Generative LLMs Are Transforming CPS Advertising Recommendations

1. Introduction

After large language models (LLMs) achieved notable success in natural language processing, the academic community began exploring how generative models can enhance search‑advertising recommendation systems. Existing work can be divided into two categories: (1) using LLMs for data and knowledge augmentation, representation extraction, and prompt‑driven dialogue recommendation without modifying the LLM—these methods cannot directly model massive collaborative signals; (2) modifying LLMs to directly model collaborative signals from large‑scale data via pre‑training or fine‑tuning, which is a frontier direction for scaling search‑advertising and requires more complex engineering. Recent industry progress includes Meta’s GR, ByteDance’s HLLM, Xiaohongshu’s NoteLLM and NoteLLM‑2, and Kuaishou’s OneRec.

2. Business Requirements & Core Technical Points

CPS advertising recommendation primarily targets off‑site users across multiple scenarios. Business requirements include precise perception of user intent, multi‑objective optimization that balances revenue and user activity, and compatibility with diverse scenarios and task data. Three core technologies are emphasized: explicit‑intent perception for controllable product recommendation, multi‑objective optimization of recommendation effectiveness, and the One4All generative recommendation framework, corresponding respectively to instruction‑following fine‑tuning, preference‑alignment, and an end‑to‑end data‑to‑model paradigm.

CPS advertising recommendation business requirements and core technical points relationship
CPS advertising recommendation business requirements and core technical points relationship
Core technical points and generative recommendation framework
Core technical points and generative recommendation framework

3. Explicit Intent Perception for Controllable Product Recommendation

The task of landing‑page product recommendation is a crucial form of off‑site advertising, corresponding to the research problem of Trigger‑Induced Recommendation (TIR). Existing solutions fall into three categories:

Implicit modeling of user intent based on historical behavior sequences.

Using trigger items for I2I recall or generating search queries (sku2query) for product retrieval.

Three‑network architectures that separately represent trigger items, model historical behavior, and estimate fusion weights (e.g., DIHN, DIAN, DEI2N, DUIN).

Explicit intent perception controllable product recommendation diagram
Explicit intent perception controllable product recommendation diagram
业务需求&现有方案局限性

Solution:

Automatically generate rich intent descriptions from traditional recommendation data, using intent text + historical product semantic ID sequence as input and target product semantic ID as output.

Reconstruct the task paradigm of trigger‑induced recommendation, leveraging generative instruction‑following fine‑tuning to perceive and dynamically control historical behavior and trigger items.

Automated Intent Generation and Evaluation

Input: user behavior data + target product.

Using Few‑shot Prompting and Chain‑of‑Thought (CoT) strategies, the Yanxi‑81B model summarizes, reasons, and predicts the current intent.

Output a triple of summary‑reason‑prediction data.

Self‑Verification is employed to evaluate the generated explicit intent.

Automated intent generation pipeline
Automated intent generation pipeline

Input‑Output Paradigm + Instruction‑Following Fine‑Tuning

Data are organized as "Input: [Prompt]. Output: [Response]" and three additional task types are added on top of sequential recommendation. The input‑output definitions are illustrated below.

Explicit intent perception task definition and examples
Explicit intent perception task definition and examples

Solution Effects

Offline: The controllable intent model improves HitRate and NDCG by 2–3× compared with non‑intent models, while demonstrating strong controllability.

Online: SKUCTR increases by >3%, and SKUCVR, same‑store orders, and same‑store commissions also see significant lifts.

Online performance gains
Online performance gains

4. Multi‑Objective Optimization of Recommendation Effectiveness

Multi‑objective optimization diagram
Multi‑objective optimization diagram

Existing non‑LLM methods include Shared Bottom, MMOE, PLE for balancing multiple tasks, and ESMM for addressing sample selection bias. LLM‑based methods comprise MORLHF and MODPO (RLHF/DPO‑based linear weighting of multiple rewards) and Reward Soups (interpolating weights of multiple LLMs).

Non‑LLM multi‑objective methods
Non‑LLM multi‑objective methods
LLM multi‑objective methods
LLM multi‑objective methods

Solution

Integrating behavior and price data to improve click‑to‑purchase conversion and overall ad revenue.

基于DPO的偏好对齐算法

Model purchase preference using a click‑to‑purchase prediction model f(click→purchase).

Treat “clicked & purchased” items as positive examples and “clicked & not purchased” items as negative examples, organizing data as "Input: [Prompt]. Output1: [Response+]. Output2: [Response‑]".

DPO preference alignment pipeline
DPO preference alignment pipeline

Limitations: DPO only considers the relative relationship between positive and negative examples and does not leverage reward values.

基于RiC (Rewards‑in‑Context) 的偏好对齐算法
RiC framework
RiC framework

Offline training incorporates multiple rewards (click, purchase, price, commission) into supervised fine‑tuning, enabling the model to learn strategies under various reward combinations.

Online training enhances data sparsity by generating augmented data near the Pareto front, followed by SFT generation, reward model scoring, and multi‑objective rejection sampling.

During inference, the learned mapping from preferences to rewards allows flexible adaptation to diverse user preferences.

RiC reward design
RiC reward design

Solution Effects

Offline: HitRate@1 improves by over 10% on multiple datasets.

Online: SKUCTR rises >1.5%, SKUCVR >7%, and same‑store orders and commissions see notable gains.

5. One4All Generative Recommendation Framework

CPS advertising involves diverse business scenarios, requiring a framework that can handle cross‑scenario adaptation and flexible model‑update strategies.

One4All framework diagram
One4All framework diagram

The design emphasizes extensibility, jointly modeling behavior and semantics to improve generalization, while optimizing model‑update policies for real‑time inference.

Online model update strategy
Online model update strategy
Online model update diagram
Online model update diagram

Solution Effects

Implemented routine online deployment supporting >10 million daily UV for CPS advertising real‑time inference.

Based on One4All, the system now accommodates additional behavior and language‑understanding tasks, enabling integrated search‑recommendation modeling, user‑behavior summarization, and personalized intent inference.

6. Summary and Future Outlook

Interactive recommendation systems (search‑recommendation joint) remain an open frontier; further exploration of generative model capabilities and product‑level redesign are needed.

Multimodal information understanding and generation: leveraging rich images and videos in the upstream pipeline can enhance recommendation quality and presentation richness.

7. References

Xu L, Zhang J, Li B, et al. Prompting large language models for recommender systems: A comprehensive framework and empirical analysis. arXiv preprint arXiv:2401.04997, 2024.

知乎《一文梳理工业界大模型推荐实战经验》. 2024.

Zhai J, Liao L, Liu X, et al. Actions speak louder than words: trillion‑parameter sequential transducers for generative recommendations. Proceedings of the 41st International Conference on Machine Learning, 2024.

Chen J, Chi L, Peng B, et al. HLLM: Enhancing sequential recommendations via hierarchical large language models for item and user modeling. arXiv preprint arXiv:2409.12740, 2024.

Zhang C, Wu S, Zhang H, et al. NoteLLM: A retrievable large language model for note recommendation. Companion Proceedings of the ACM Web Conference 2024, 2024.

Zhang C, Zhang H, Wu S, et al. NoteLLM‑2: Multimodal large representation models for recommendation. arXiv preprint arXiv:2405.16789, 2024.

Deng J, Wang S, Cai K, et al. OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment. arXiv preprint arXiv:2502.18965, 2025.

Ma J, Xiao Z, Yang L, et al. Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger‑Induced Recommendation. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024.

Shen Q, Wen H, Tao W, et al. Deep interest highlight network for click‑through rate prediction in trigger‑induced recommendation. ACM Web Conference 2022, 2022.

Xia Y, Cao Y, Hu S, et al. Deep intention‑aware network for click‑through rate prediction. ACM Web Conference 2023, 2023.

Xiao Z, Yang L, Zhang T, et al. Deep evolutional instant interest network for CTR prediction in trigger‑induced recommendation. ACM International Conference on Web Search and Data Mining, 2024.

Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi‑task learning with multi‑gate mixture‑of‑experts. KDD 2018.

Tang H, Liu J, Zhao M, et al. Progressive layered extraction (PLE): A novel multi‑task learning model for personalized recommendations. ACM RecSys 2020.

Ma X, Zhao L, Huang G, et al. Entire space multi‑task model: An effective approach for estimating post‑click conversion rate. SIGIR 2018.

Zhou Z, Liu J, Shao J, et al. Beyond One‑Preference‑Fits‑All Alignment: Multi‑Objective Direct Preference Optimization. ACL 2024.

Li K, Zhang T, Wang R. Deep reinforcement learning for multi‑objective optimization. IEEE Transactions on Cybernetics, 2020.

Rame A, Couairon G, Dancette C, et al. Rewarded soups: towards Pareto‑optimal alignment by interpolating weights fine‑tuned on diverse rewards. NeurIPS 2023.

Rafailov R, Sharma A, Mitchell E, et al. Direct preference optimization: Your language model is secretly a reward model. NeurIPS 2023.

Wu J, Xie Y, Yang Z, et al. beta‑DPO: Direct Preference Optimization with Dynamic beta. NeurIPS 2025.

Lin X, Chen H, Pei C, et al. A Pareto‑efficient algorithm for multiple objective optimization in e‑commerce recommendation. ACM RecSys 2019.

Hu J, Tao L, Yang J, et al. Aligning language models with offline learning from human feedback. arXiv preprint arXiv:2308.12050, 2023.

Yang R, Pan X, Luo F, et al. Rewards‑in‑Context: Multi‑objective alignment of foundation models with dynamic preference adjustment. ICML 2024.

LLMmulti‑objective optimizationgenerative recommendationCPS advertisingintent perception
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.