BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

The paper introduces BREEZE, a behavior‑regularized zero‑shot RL framework that improves stability, policy extraction, and representation quality by combining in‑sample learning, task‑conditioned diffusion models, and expressive attention‑based architectures, achieving near‑state‑of‑the‑art performance on benchmarks like ExORL and D4RL Kitchen.

Data Party THU
Data Party THU
Data Party THU
BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

Background

Zero‑shot reinforcement learning (RL) seeks pre‑trained generalist policies that can adapt to new tasks without additional environment interactions. Forward‑Backward (FB) representations have shown promise but suffer from limited expressiveness and out‑of‑distribution (OOD) action extrapolation errors, which bias the learned representations and degrade performance.

Proposed Method: BREEZE

BREEZE (Behavior‑REgularizEd Zero‑shot RL with Expressivity enhancement) extends the FB framework with three innovations:

Behavioral regularization : reformulates policy optimization as an in‑sample learning objective, reducing variance caused by OOD actions.

Task‑conditioned diffusion model : acts as a policy extractor that generates high‑quality, multimodal action distributions conditioned on the task description.

Expressive attention‑based architecture : employs multi‑head attention to capture complex state‑action dynamics, improving representation learning.

Experiments

Evaluations on benchmark suites such as ExORL and D4RL Kitchen demonstrate that BREEZE attains performance comparable to or exceeding state‑of‑the‑art zero‑shot RL methods while providing substantially better robustness to OOD actions.

Implementation

The reference implementation is publicly available at https://github.com/Whiterrrrr/BREEZE.

Code example

来源:专知
本文
约1000字
,建议阅读
5
分钟
我们提出了
BREEZE(Behavior-REgularizEd Zero-shot RL with Expressivity enhancement)
,一种改进的基于FB框架的算法体系。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

diffusion modelreinforcement learningoffline RLzero-shot RLbehavioral regularization
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.