Two‑Stage Constrained Actor‑Critic Reinforcement Learning for Short‑Video Recommendation and a Multi‑Task RL Framework
This article presents a two‑stage constrained actor‑critic reinforcement learning algorithm for short‑video recommendation, models the problem as a constrained MDP, details the algorithm’s stages, and reports extensive offline and online experiments showing superior watch‑time and interaction metrics, followed by a multi‑task RL framework and its evaluations.
