Artificial Intelligence 16 min read

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

This article presents DRL-Rec, a distilled reinforcement learning framework for recommendation that integrates an exploring‑filtering module and confidence‑guided distillation to compress RL‑based recommenders while improving accuracy, and reports significant offline and online performance gains on a large‑scale system.

DataFunTalk
DataFunTalk
DataFunTalk
Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

The paper, based on the CIKM 2021 work "Explore, Filter and Distill: Distilled Reinforcement Learning in Recommendation" from the WeChat "Look" team, addresses the high memory and time costs of reinforcement‑learning (RL) based recommendation models and proposes a knowledge‑distillation solution.

DRL-Rec introduces three key components: a teacher/student recommendation network, an exploring/ filtering module that selects high‑information items from millions of candidates, and a confidence‑guided distillation module that weights each distilled example by the teacher’s confidence.

In the teacher/student module, both networks share the same architecture as HRL‑Rec but differ in vector dimensions; DDQN (a value‑based RL algorithm) is used for stable training. The model processes a user’s item list by first applying a Transformer to previous items, then a GRU to aggregate features, and finally an MLP to predict Q‑values.

The exploring/ filtering module first retrieves a thousand‑level candidate set, then lets the teacher and student score these items. The top‑k items from each model are selected for distillation, ensuring that only the most informative items are used.

The confidence‑guided distillation module combines a list‑wise KL‑divergence loss and a Hint loss, both weighted by a confidence score derived from the agreement between the teacher’s predicted Q‑value and the true Q‑value. This guides the student to learn more from reliable teacher predictions.

Experiments show that the student model’s size and inference latency are reduced to 49.7% and 76.7% of the teacher respectively, while achieving higher offline AUC and online click‑through metrics. Ablation studies confirm the effectiveness of each module, and further analysis demonstrates robust performance across different compression rates and numbers of filtered items.

In summary, DRL-Rec successfully applies knowledge distillation to RL‑based recommendation, delivering both accuracy and efficiency improvements and has been deployed in the WeChat "Look" system serving millions of users.

model compressionrecommendation systemsreinforcement learningknowledge distillationonline experiments
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.