Personalized Approximate Pareto-Efficient Recommendation (PAPERec): A Multi‑Objective Reinforcement Learning Framework for User‑Level Objective Personalization

The paper introduces PAPERec, a personalized multi‑objective recommendation framework that leverages Pareto‑oriented reinforcement learning to generate user‑specific objective weights, enabling the model to approximate Pareto‑optimal solutions and achieve superior click‑through rate and dwell‑time performance in both offline and online experiments.

DataFunTalk
DataFunTalk
DataFunTalk
Personalized Approximate Pareto-Efficient Recommendation (PAPERec): A Multi‑Objective Reinforcement Learning Framework for User‑Level Objective Personalization

Real‑world recommendation systems often need to optimize multiple objectives such as click‑through rate (CTR), dwell time, diversity, and retention. Existing multi‑objective recommendation (MOR) approaches typically use a single set of objective weights for all users, ignoring individual preferences.

To address this limitation, the authors propose Personalized Approximate Pareto‑Efficient Recommendation (PAPERec), a framework that generates personalized objective weights via a Pareto‑oriented reinforcement learning (RL) module. By approximating Pareto optimality, PAPERec tailors the trade‑off among objectives for each user.

The model builds on the Multiple Gradient Descent Algorithm (MGDA) and its Pareto stationary point theory. The RL module treats the current recommendation list as a state, the generation of user‑specific weights as an action, and defines a reward based on the L2 norm of the weighted sum of multi‑objective gradients, encouraging the system to move toward Pareto‑optimal solutions.

PAPERec is deployed in the WeChat "Look" list‑wise recommendation pipeline, jointly optimizing CTR and dwell time. The overall loss combines single‑objective losses for each metric with the RL‑derived reward. Training uses DDPG, and the architecture incorporates Transformer and GRU components for feature interaction and sequential modeling.

Extensive offline and online experiments on the production system demonstrate that PAPERec achieves the best dwell‑time results while maintaining competitive CTR, outperforming baseline models in the Pareto‑dominance analysis. Further analysis shows that users and items with higher personalized weights for a specific objective indeed exhibit higher performance on that metric, confirming effective objective‑level personalization.

In summary, PAPERec introduces a novel personalized Pareto‑approximation approach for multi‑objective recommendation, combining scalarization, MGDA theory, and reinforcement learning to deliver both academic insights and practical improvements in large‑scale recommender systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

personalizationCTRRecommendation SystemsReinforcement Learningmulti-objective optimizationdwell timePareto efficiency
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.