Unlock Scalable RL: AReaL’s Decoupled Agentic Framework & Single‑Controller Design
This article explains how the open‑source AReaL framework boosts large‑scale reinforcement learning by separating agent execution from training logic, introducing a decoupled Agentic RL service and a Single‑Controller architecture that improves data flow, fault tolerance, and GPU utilization.
Overview
AReaL is an open‑source reinforcement‑learning (RL) framework designed for large‑scale models (e.g., trillion‑parameter Ring‑1T). It provides a lightweight API and an extensible plugin system that decouples agent execution from RL training, enabling developers to focus on algorithm design.
Decoupled Agentic RL
Traditional agentic RL tightly couples training logic with the agent, making reuse and debugging difficult. AReaL adopts an "Agent Autonomy + RL as Observer" design:
Agent Autonomy : The agent is a pure LLM‑based decision system that receives inputs, calls tools, generates actions, and returns results without awareness of the training process.
RL as Observer : AReaL records each interaction as a trajectory (input, thought chain, action, observation, reward) for downstream RL algorithms.
Workflow
Agent launch & proxy wrapper – Users implement a single async function async def run_agent_return_reward(data: Any) -> float: that runs the agent and returns a scalar reward.
Trajectory collection – AReaL opens a session for each run, caching the input query, LLM token outputs, tool results, and the computed reward.
RL training – Collected trajectories are sorted, discounted (using a user‑specified discount factor), and fed to any standard RL algorithm. Updated policy weights are sent back to the agent.
Model deployment & closed‑loop iteration – Trained models can be exported in HuggingFace format and deployed without code changes; new interactions are continuously collected to form a feedback loop.
Agent interface example
async def run_agent_return_reward(data: Any) -> float:
"""Run the agent on a single data sample and return the reward.
Args:
data: An element from the training dataset.
Returns:
reward: Float value representing the episode reward.
"""
# User‑defined logic that calls the LLM, interacts with tools, etc.
...Single‑Controller Architecture
The classic SPMD execution model suffers from long‑tail tasks and coarse‑grained control, limiting throughput and fault recovery in RL workloads. AReaL replaces it with a layered "Controller + Distributed Engine" design that separates control‑plane logic from data‑plane processing.
Controller (CPU node) : Handles distributed scheduling, data aggregation, and exposes the same interface as the engine.
Worker : Executes the engine; can be co‑located with the engine or run in a separate process. It abstracts distributed data flow via DistributedBatch metadata.
Engine : Performs parallel computation and is compatible with native SGLang, FSDP, Megatron, etc.
Metadata structures
from dataclasses import dataclass, field
@dataclass
class TensorMetadata:
"""Metadata for a tensor field."""
shape: tuple[int, ...]
dtype: str
device: str = "cpu"
@dataclass
class ShardMetadata:
"""Metadata for a single (sub‑)shard stored on one node."""
node_id: str
node_addr: str
shard_id: str
batch_size: int
offset: int = 0
fields: dict[str, TensorMetadata] = field(default_factory=dict)
@dataclass
class BatchMetadata:
"""Metadata for a distributed batch sharded across multiple nodes."""
batch_id: str
global_step: int
total_batch_size: int
shards: list[ShardMetadata] = field(default_factory=list)Data‑flow RL process
Rollout Controller gathers metadata from inference engines.
Train Controller shards the metadata according to the data‑parallel strategy and dispatches shards to Workers.
Workers lazily pull required tensors via RPC, avoiding full tensor transfer.
This metadata‑driven approach eliminates the single‑point bottleneck of SPMD and improves scalability.
API Demo
Launching with the built‑in launcher:
# python3 -m areal.launcher.local script.py --config xxx.yaml
def main(args):
actor = FSDPPPOActor(config=config.actor)
actor.create_process_group(parallel_strategy=parallel_strategy)
rollout = RemoteSGLangEngine(config.rollout)
rollout.initialize(train_data_parallel_size=parallel_strategy.dp_size)
# Load data on head rank and broadcast
batch = None
if actor.is_data_parallel_head():
batch = rollout.prepare_batch(...)
batch = tensor_container_to(batch, actor.device)
batch = broadcast_tensor_container(
batch,
src_rank=actor.current_data_parallel_head(),
group=...
)Using the controller directly (no launcher needed):
# python script.py --config xxx.yaml
def main(args):
actor = TrainController(
engine=FSDPPPOActor(config=config.actor),
scheduler=LocalScheduler(...)
)
rollout = RolloutController(
engine=RemoteSGLangEngine(config=rollout),
scheduler=LocalScheduler(...)
)
batch = rollout.prepare_batch(...)
# Controller automatically handles data distributionFuture Outlook
AReaL currently supports basic agentic RL pipelines and the single‑controller mode. Planned enhancements include:
High‑efficiency data flow and distributed startup for the Single Controller mode.
Automatic scaling, fault‑tolerant high‑availability training.
Trajectory versioning, visualization platform, and richer analytics.
Further performance optimizations for large‑scale agentic scenarios.
Repository: https://github.com/inclusionAI/AReaL
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
