Oct 24, 2025 · Artificial Intelligence

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

The paper introduces BREEZE, a behavior‑regularized zero‑shot RL framework that improves stability, policy extraction, and representation quality by combining in‑sample learning, task‑conditioned diffusion models, and expressive attention‑based architectures, achieving near‑state‑of‑the‑art performance on benchmarks like ExORL and D4RL Kitchen.

Offline RLbehavioral regularizationdiffusion model

0 likes · 3 min read

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

Baobao Algorithm Notes

Feb 10, 2025 · Artificial Intelligence

Why Base‑Model RL Beats Traditional SFT‑RL: Theory, Practice, and Zero‑RL Insights

The article analyzes how applying reinforcement learning directly on base LLMs offers theoretical advantages, practical guidance, and experimental evidence that surpasses conventional cold‑start SFT‑RL pipelines, while also exploring zero‑RL approaches, KL constraints, and scaling considerations.

KL constraintbase-model RLzero-shot RL

0 likes · 11 min read

Why Base‑Model RL Beats Traditional SFT‑RL: Theory, Practice, and Zero‑RL Insights