RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

The paper proposes RMTL, a reinforcement‑learning driven multi‑task learning framework that builds session‑level MDPs, trains a multi‑task actor‑critic network with dynamic loss weighting, and demonstrates significant AUC improvements over state‑of‑the‑art MTL recommendation models on public datasets.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

Recent advances in multi‑task learning (MTL) have greatly benefited recommendation systems, yet most existing MTL‑based models ignore the session‑level interaction patterns between users and the system because they are built on single‑item datasets. To address this gap, the authors introduce RMTL, a reinforcement learning (RL) based MTL framework that employs dynamic weighting to balance the losses of different recommendation tasks.

Problem Modeling: The authors construct a session‑based Markov Decision Process (MDP) where each state consists of user‑item features, the action space contains continuous predictions for CTR and CTCVR, and the reward is defined as the negative binary cross‑entropy (BCE) to align with the BCE loss. This formulation captures the sequential nature of user behavior and enables adaptive loss weighting via the critic network.

Algorithm: RMTL consists of a state representation network (embedding layer + MLP) that converts raw features into state vectors, an Actor network (any base MTL model) that outputs task‑specific actions, and a multi‑critic architecture with two parallel MLPs sharing a bottom network. The critics estimate Q‑values for each task and generate adaptive loss weights, which are used to compute a weighted BCE loss for the overall objective.

The overall training loop is: given user‑item features, the state network produces a state; the Actor selects actions; the actions and features are processed by the Critic(s) to obtain Q‑values; the weighted BCE loss is calculated and back‑propagated, while the critic updates its parameters using temporal‑difference (TD) error.

Experiments: The framework is evaluated on two benchmark datasets, RetailRocket and Kuairand, using AUC, logloss, and session‑averaged logloss (s‑logloss). RMTL consistently outperforms five baseline MTL models and their non‑RL counterparts, achieving AUC gains of 0.003–0.005 on RetailRocket. Transferability studies show that pretrained critics from one MTL model improve the performance of other models. Ablation experiments (constant weight, weight‑learning, and non‑linear weight variants) confirm the effectiveness of the adaptive loss weighting.

Conclusion and Future Work: RMTL demonstrates that session‑level RL can effectively balance multiple recommendation tasks, improve prediction performance, and be compatible with existing MTL models, offering strong transferability. Future directions include exploring richer state representations and extending the framework to other multi‑task domains.

Multi-Task Learningreinforcement learningactor‑criticadaptive loss weightingsession-based modeling
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.