Tagged articles

Offline RL

6 articles · Page 1 of 1

Jul 2, 2026 · Artificial Intelligence

EMCES: How Episodic Memory Guides Controllable Sample Synthesis to Boost Reinforcement Learning

The paper introduces EMCES, a method that injects episodic memory into controllable diffusion models and uses a hash‑based state representation to generate high‑value synthetic samples, dramatically improving sample efficiency and downstream reinforcement‑learning performance while cutting storage and time costs.

Episodic MemoryHashingOffline RL

0 likes · 14 min read

EMCES: How Episodic Memory Guides Controllable Sample Synthesis to Boost Reinforcement Learning

Bighead's Algorithm Notes

Mar 29, 2026 · Artificial Intelligence

How MetaTrader Uses Reinforcement Learning to Boost Trading Strategy Generalization

The article reviews the MetaTrader method, which formulates sequential portfolio optimization as a partially offline reinforcement‑learning problem, introduces a double‑layer RL algorithm and a conservative TD objective to improve out‑of‑distribution generalization, and demonstrates superior performance on CSI‑300 and NASDAQ‑100 datasets compared with existing baselines.

Financial TradingMetaTraderOOD data augmentation

0 likes · 15 min read

How MetaTrader Uses Reinforcement Learning to Boost Trading Strategy Generalization

Machine Heart

Mar 29, 2026 · Artificial Intelligence

Scaling World Model Dynamics to Over a Thousand Steps in Two ICLR Papers

The article reviews two ICLR papers by Haoxin Lin that advance world‑model dynamics from single‑step bootstrapping to any‑step direct prediction, introduce structured uncertainty via backtracking, and achieve stable full‑horizon roll‑outs of over a thousand steps, dramatically improving both online and offline reinforcement‑learning performance.

Offline RLReinforcement Learningany-step prediction

0 likes · 16 min read

Scaling World Model Dynamics to Over a Thousand Steps in Two ICLR Papers

Data Party THU

Oct 24, 2025 · Artificial Intelligence

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

The paper introduces BREEZE, a behavior‑regularized zero‑shot RL framework that improves stability, policy extraction, and representation quality by combining in‑sample learning, task‑conditioned diffusion models, and expressive attention‑based architectures, achieving near‑state‑of‑the‑art performance on benchmarks like ExORL and D4RL Kitchen.

Offline RLReinforcement Learningbehavioral regularization

0 likes · 3 min read

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

AI Frontier Lectures

Apr 18, 2025 · Artificial Intelligence

From RL’s Early Days to Its Future: A Four‑Stage Evolution of Reinforcement Learning

This reflective essay traces reinforcement learning’s decade‑long evolution through four stages—early algorithmic foundations, application‑driven growth, problem‑construction focus, and speculative future—while critiquing the expanding definition and its impact on research and industry.

AI researchOffline RLRL evolution

0 likes · 9 min read

From RL’s Early Days to Its Future: A Four‑Stage Evolution of Reinforcement Learning

DataFunSummit

Jun 16, 2024 · Artificial Intelligence

Reinforcement Learning in Recommendation Systems: Practice, Challenges, and Industry Advances

This article presents a comprehensive overview of applying reinforcement learning to recommendation systems, covering background challenges, practical exploration, frontier research directions, multi‑agent and inverse RL approaches, evaluation methods, and future outlooks, based on a KDD‑published study and industry experience.

EvaluationInverse RLOffline RL

0 likes · 24 min read

Reinforcement Learning in Recommendation Systems: Practice, Challenges, and Industry Advances