Tagged articles

actor-critic

12 articles · Page 1 of 1

May 4, 2026 · Artificial Intelligence

Understanding the Mathematical Foundations of Reinforcement Learning

This article provides a concise overview of a ten‑chapter reinforcement‑learning textbook, outlining the progression from basic concepts such as states and rewards to advanced algorithms like policy gradients and actor‑critic methods, and explains how each chapter builds on the previous ones.

Bellman equationMonte Carloactor-critic

0 likes · 11 min read

Understanding the Mathematical Foundations of Reinforcement Learning

Data Party THU

Nov 15, 2025 · Artificial Intelligence

How Reinforcement Learning Powers Intelligent AI Agents and LangGraph Workflows

This article explains how reinforcement learning (RL) underpins intelligent AI agents, covering the Markov Decision Process fundamentals, key RL components, multi‑hop reasoning on knowledge graphs, and a step‑by‑step LangGraph example that integrates an RL‑driven tutoring policy with Python code.

AI agentsKnowledge GraphLangGraph

0 likes · 17 min read

How Reinforcement Learning Powers Intelligent AI Agents and LangGraph Workflows

Kuaishou Tech

Aug 6, 2025 · Artificial Intelligence

How Supervised Learning‑Enhanced Multi‑Group Actor‑Critic Boosts Live Stream Allocation in Short‑Video Feeds

This article presents the SL‑MGAC framework, a supervised‑learning‑enhanced multi‑group Actor‑Critic algorithm that improves live‑stream insertion decisions in mixed short‑video and live‑stream recommendation systems, achieving higher stability and better long‑term user engagement while satisfying platform constraints, as validated by extensive offline and online experiments.

KDD 2025actor-criticlive stream recommendation

0 likes · 9 min read

How Supervised Learning‑Enhanced Multi‑Group Actor‑Critic Boosts Live Stream Allocation in Short‑Video Feeds

Baobao Algorithm Notes

Nov 18, 2024 · Artificial Intelligence

Demystifying Actor‑Critic and PPO: From Policy Gradients to Practical RL

This article provides a thorough, step‑by‑step explanation of reinforcement‑learning theory—covering policy‑based objectives, value‑function definitions, the derivation of policy gradients, actor‑critic architecture, advantage estimation, importance sampling, GAE, and the PPO algorithm—aimed at readers with little prior RL knowledge.

PPOactor-criticadvantage estimation

0 likes · 31 min read

Demystifying Actor‑Critic and PPO: From Policy Gradients to Practical RL

DataFunTalk

Mar 30, 2024 · Artificial Intelligence

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

This talk presents Kuaishou's research on combining reinforcement learning with multi‑task recommendation, detailing a two‑stage constrained actor‑critic method for short‑video ranking, a multi‑task RL framework, experimental results on offline and online systems, and practical Q&A insights.

Kuaishouactor-criticmulti-task recommendation

0 likes · 18 min read

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

Sohu Tech Products

Nov 8, 2023 · Artificial Intelligence

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework

The presentation introduces a two‑stage constrained actor‑critic algorithm that learns auxiliary policies for interaction signals before optimizing watch‑time under KL constraints, and a reinforcement‑learning multi‑task learning framework that models session‑level dynamics with adaptive multi‑critic weighting, both achieving significant offline and online gains in short‑video recommendation.

Multi-Task LearningRecommendation Systemsactor-critic

0 likes · 16 min read

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework

Baidu Geek Talk

Aug 16, 2023 · Artificial Intelligence

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

This article provides a comprehensive overview of reinforcement learning, covering fundamental concepts, differences from supervised learning, algorithm families, policy gradient methods, practical tricks like baselines and reward‑to‑go, and detailed explanations of TRPO and PPO variants with illustrative diagrams.

PPOactor-criticmachine learning

0 likes · 19 min read

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

Kuaishou Tech

Apr 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

The paper models short‑video recommendation as a constrained Markov decision process and introduces a two‑stage constrained actor‑critic algorithm that jointly maximizes watch time while satisfying multiple interaction constraints, demonstrating superior offline and online performance on the KuaiRand dataset and Kuaishou app.

actor-criticconstrained optimizationoffline evaluation

0 likes · 7 min read

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

HomeTech

Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIAttention Mechanismactor-critic

0 likes · 22 min read

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

DaTaobao Tech

Aug 18, 2022 · Artificial Intelligence

Introduction to Deep Reinforcement Learning: Theory, Algorithms, and Applications

This article introduces deep reinforcement learning by explaining its Markov decision process foundations, then categorizes the main algorithm families—value‑based methods like DQN, policy‑based approaches such as PG/DPG/DDPG, and actor‑critic techniques including A3C, PPO, and DDPG—detailing their architectures, training procedures, and key advantages.

DQNMDPactor-critic

0 likes · 14 min read

Introduction to Deep Reinforcement Learning: Theory, Algorithms, and Applications

IEG Growth Platform Technology Team

Aug 16, 2022 · Artificial Intelligence

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

The paper proposes an actor‑critic reinforcement‑learning model (ACRL) that leverages PPO and a deep structured semantic model to optimize real‑time bidding strategies for mobile game ads under CPM and budget constraints, addressing long user lifecycles and sparse conversion data while demonstrably improving ROI in both offline simulations and online A/B tests.

Mobile AdvertisingOnline AdvertisingROI

0 likes · 16 min read

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

Code DAO

Dec 3, 2021 · Artificial Intelligence

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL

This article derives the policy‑gradient objective for discrete actions, implements the Monte‑Carlo REINFORCE algorithm in PyTorch, explains the actor‑critic framework, introduces Advantage Actor‑Critic (A2C) versus A3C, and demonstrates their performance on the OpenAI Gym CartPole‑v0 environment.

A2COpenAI GymPython

0 likes · 13 min read

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL