Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 13, 2026 · Artificial Intelligence

How AReaL v1.0 Enables Scalable Agentic RL on Ascend NPU with AWEX Weight Sync

The new AReaL v1.0 release brings full Ascend NPU support, detailed installation guides, and a best‑practice example for training a 30B MoE model across four nodes, while the integrated AWEX weight‑sync mechanism dramatically reduces synchronization time, improving efficiency and stability for large‑scale Agentic RL workloads.

AWEXAgentic RLAscend NPU
0 likes · 12 min read
How AReaL v1.0 Enables Scalable Agentic RL on Ascend NPU with AWEX Weight Sync
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Agentic RLArgo WorkflowsScalable Reinforcement Learning
0 likes · 10 min read
Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 6, 2026 · Artificial Intelligence

Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It

Recent studies reveal that in Agentic RL, jointly training reasoning and tool-use on shared parameters creates a persistent negative interaction, with gradients nearly orthogonal, limiting performance; a disentangled tuning approach (DART) using separate LoRA adapters isolates the two abilities and restores gains across benchmarks.

Agentic RLDARTGradient Interference
0 likes · 12 min read
Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 24, 2026 · Artificial Intelligence

The Bitter Lesson of Building Agentic RL in Terminal Environments

This article recounts the challenges of moving from single‑step RL with verifiable rewards to multi‑step agentic reinforcement learning in terminal environments, detailing infrastructure design, asynchronous pipelines, data quality checks, masking strategies, curriculum training, chunk‑based optimization, and practical lessons learned from large‑scale experiments.

Agentic RLCredit AssignmentEnvironment Augmentation
0 likes · 33 min read
The Bitter Lesson of Building Agentic RL in Terminal Environments
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 19, 2026 · Artificial Intelligence

Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance

The article dissects GLM-5’s 744B‑parameter MoE design, 28.5 T token training corpus, novel Muon Split and MLA‑256 optimizations, DSA sparse attention, a fully asynchronous RL pipeline, extensive domestic chip adaptation, and benchmark results that place it on par with Claude Opus 4.5 and ahead of Gemini 3 Pro.

AI ArchitectureAgentic RLDSA
0 likes · 13 min read
Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 4, 2026 · Artificial Intelligence

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

This comprehensive guide walks through reinforcement learning fundamentals, MDP modeling, value functions, Bellman equations, and key algorithms such as Q‑learning, REINFORCE, PPO, DPO, and GRPO, then contrasts LLM‑RL with Agentic‑RL and surveys leading industry frameworks and real‑world applications.

Agentic RLArtificial IntelligenceLLM
0 likes · 42 min read
Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques
Alimama Tech
Alimama Tech
Jan 7, 2026 · Artificial Intelligence

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

This article examines the limitations of Vibe Coding for large AI infrastructure, proposes a text‑driven, document‑centric workflow, and presents a time‑multiplexed GPU scheduling solution that dramatically improves rollout throughput and reduces timeouts in large‑scale Agentic RL training.

Agentic RLDesign DocumentsGPU Scheduling
0 likes · 21 min read
Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Agentic RLAsynchronous InferenceRL Systems
0 likes · 18 min read
Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl
AntTech
AntTech
Dec 18, 2025 · Artificial Intelligence

How AEnvironment Powers Scalable Agentic RL with a Unified MCP Protocol

AEnvironment is an open‑source, unified environment platform for Agentic Reinforcement Learning that abstracts all resources as services via the MCP protocol, enabling trillion‑scale model training, rapid app generation, benchmark integration, and seamless deployment through a high‑performance ASandbox runtime.

AEnvironmentAgentic RLEnvironment Platform
0 likes · 11 min read
How AEnvironment Powers Scalable Agentic RL with a Unified MCP Protocol
Data Party THU
Data Party THU
Sep 15, 2025 · Artificial Intelligence

Agentic RL: Transforming LLMs into Autonomous Decision‑Making Agents

This survey formalizes the shift from preference‑based reinforcement fine‑tuning to Agentic Reinforcement Learning, defines Agentic RL via MDP/POMDP abstractions, proposes a dual taxonomy of capabilities and task domains, compiles over 500 recent works, and outlines open challenges for scalable, robust AI agents.

AI agentsAgentic RLLLM
0 likes · 12 min read
Agentic RL: Transforming LLMs into Autonomous Decision‑Making Agents