Tagged articles
4 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 12, 2026 · Artificial Intelligence

Breaking Off‑Policy Shift: Bengio’s TBA Decouples Sampling and Learning for 50× Faster LLM RL

Trajectory Balance with Asynchrony (TBA) separates sample generation (Searcher) from model updates (Trainer), uses a trajectory‑balance objective to incorporate off‑policy data, and achieves up to 50× speedup in large‑model RL post‑training while preserving or improving performance on math reasoning, preference fine‑tuning, and red‑team tasks.

LLMLarge Language ModelsReinforcement Learning
0 likes · 10 min read
Breaking Off‑Policy Shift: Bengio’s TBA Decouples Sampling and Learning for 50× Faster LLM RL
AI Algorithm Path
AI Algorithm Path
May 22, 2025 · Artificial Intelligence

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

This tutorial explains how Monte Carlo methods are enhanced in reinforcement learning through epsilon‑greedy and epsilon‑soft policies, Monte Carlo control, a Blackjack Q‑function example, the distinction between on‑policy and off‑policy learning, importance sampling, and efficient incremental update techniques.

Epsilon-GreedyImportance SamplingMonte Carlo
0 likes · 14 min read
Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates
360 Quality & Efficiency
360 Quality & Efficiency
Apr 17, 2020 · Artificial Intelligence

Extending APEX for Real Distributed Reinforcement Learning with tf2rl

The article examines the limitations of the single‑machine APEX framework in the tf2rl reinforcement‑learning library, proposes a cross‑machine distributed architecture using middleware such as Redis, compares alternative frameworks like EasyRL, and outlines expected performance gains and future development plans.

APEXDistributed TrainingReinforcement Learning
0 likes · 5 min read
Extending APEX for Real Distributed Reinforcement Learning with tf2rl
DataFunTalk
DataFunTalk
Sep 30, 2019 · Artificial Intelligence

Reinforcement Learning for Recommender Systems: Challenges, Solutions, and Key Papers

This article reviews recent advances in applying reinforcement learning to recommendation systems, explains the fundamental RL concepts, discusses the specific challenges such as large action spaces, bias, and long‑term reward modeling, and summarizes two influential YouTube papers along with practical insights and future directions.

Reinforcement LearningTop‑Klong-term reward
0 likes · 13 min read
Reinforcement Learning for Recommender Systems: Challenges, Solutions, and Key Papers