How AReaL v1.0 Enables Scalable Agentic RL on Ascend NPU with AWEX Weight Sync
The new AReaL v1.0 release brings full Ascend NPU support, detailed installation guides, and a best‑practice example for training a 30B MoE model across four nodes, while the integrated AWEX weight‑sync mechanism dramatically reduces synchronization time, improving efficiency and stability for large‑scale Agentic RL workloads.
The open‑source Agentic RL framework AReaL has been upgraded to version 1.0 (released March 2 2026). This release adds comprehensive support for Huawei’s Ascend AI hardware, enabling developers to run full Agentic RL training pipelines on Ascend NPU clusters.
Installation on Ascend
A step‑by‑step installation guide is provided at:
https://inclusionai.github.io/AReaL/zh/tutorial/installation_npu.html
The guide covers dependency preparation, environment setup, and integration with Ascend‑specific libraries such as vLLM‑Ascend, MindSpeed, and Ray for multi‑node orchestration.
Best‑Practice Example: Tau2 Agent Training
A complete best‑practice example demonstrates training the Tau2 airline scheduling agent using a 30‑billion‑parameter MoE model (Qwen3‑30B‑A3B) on four Ascend A3 nodes. The example includes:
Training scenario: Tau2 Agent (tau2‑airline)
Model: Qwen3‑30B‑A3B (MoE)
Hardware: 4 × Ascend NPU A3 nodes
Configuration files are located in the repository, e.g., examples/tau2/README_NPU.md and the YAML launch script examples/tau2/README_NPU.md.
Distributed Training Loop
The training pipeline combines several components:
vLLM OpenAI‑compatible API server as the user‑simulator service.
Ray for launching and scheduling a 4‑node cluster.
AReaL for coordinated training and inference.
Megatron/MindSpeed for parallel model partitioning.
The recommended resource allocation mode is expressed as:
allocation_mode: vllm:d4t4+megatron:(attn:d2p4t4|ffn:d1p4e8)This configuration directs vLLM to handle the inference side while Megatron manages the training side, enabling efficient weight sharing across the 30B‑parameter MoE model.
AWEX Weight‑Sync Integration
AWEX (Topology‑aware P2P) is now fully integrated into AReaL. It replaces naive full‑weight copying with shard‑level transfers, reducing memory footprint and improving stability for large‑scale, multi‑node RL workloads. Key features include:
Topology‑aware P2P weight exchange.
Transmission of only required parameter shards.
Elimination of redundant full‑weight copies.
Lowered memory and buffer overhead.
Improved stability for dense and MoE models.
Developers can enable AWEX by setting actor.weight_update_mode: awex in the PPOTrainer configuration. The framework automatically prepares the runtime environment, removing the need for manual assembly.
Performance Evaluation
Benchmarks on the qwen3‑30B‑A3B model show a reduction of weight‑sync time from ~50 seconds to ~15 seconds on a 4‑node Ascend A3 cluster. Larger models (e.g., qwen3‑235B‑A30B) also benefit from lower buffer overhead and stable execution.
These results demonstrate that AWEX delivers tangible engineering gains for large‑scale, multi‑node RL systems, improving both efficiency and reliability.
Conclusion
AReaL v1.0 marks a transition from a research prototype to a production‑ready framework on Ascend. With full hardware support, robust distributed training pipelines, and the AWEX weight‑sync mechanism, developers can now build, train, and deploy Agentic RL agents at scale, paving the way for future extensions such as Code Agents, Deep Search Agents, and Tool‑Use Agents.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
