Artificial Intelligence 12 min read

HRL-Rec: A Hierarchical Reinforcement Learning Framework for Integrated Recommendation

This article presents HRL-Rec, a hierarchical reinforcement learning model that jointly learns user preferences at the item and channel levels for integrated recommendation systems, and demonstrates its superior offline and online performance, stability, and scalability through extensive experiments on the WeChat "See" platform.

DataFunTalk
DataFunTalk
DataFunTalk
HRL-Rec: A Hierarchical Reinforcement Learning Framework for Integrated Recommendation

Real‑world information‑flow recommendation systems must simultaneously recommend heterogeneous items (articles, videos, news, products, etc.) from multiple sources, often using separate channels to decouple and customize models. In such integrated recommendation scenarios, both item‑level and channel‑level user preferences are crucial.

To address these challenges, we propose HRL-Rec, a hierarchical reinforcement learning framework consisting of a low‑level channel selector (LRA) and a high‑level item recommender (HRA). The channel selector generates a sequence of channels for each request, while the item recommender selects specific items constrained by the chosen channels. Diverse loss functions, supervised and similarity losses, and multiple reward signals (clicks, dwell time, diversity, novelty) are designed to ensure fast, stable convergence.

HRL-Rec is built on a list‑wise recommendation pipeline. For position t in the list, the channel selector predicts channel c_t based on previously recommended items and user/context features; the item recommender then selects item d_t within channel c_t . Both agents are trained with DDPG, augmented by the additional losses to improve realism of generated actions.

The overall architecture (Figure 2) shows how HRL-Rec processes each user request, extracts features from historical items, encodes them with a sequential model, and feeds the state to actor‑critic networks for both channel selection and item recommendation.

Extensive offline and online experiments on the WeChat "See" system demonstrate that HRL-Rec achieves the best performance across all metrics, including click‑through rate, average clicks, and diversity. Ablation studies confirm the effectiveness of each module, and stability tests show that HRL‑Rec maintains consistent channel proportions over two weeks, indicating strong system robustness.

In summary, HRL‑Rec systematically tackles the integrated recommendation problem by modeling multi‑granular user preferences with hierarchical reinforcement learning, delivering superior accuracy, diversity, and stability, and has already been deployed to serve millions of users.

recommendation systemonline experimentschannel selectorhierarchical reinforcement learningintegrated recommendationitem recommender
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.