Agentic RL — 11 Technical Articles

Apr 13, 2026 · Artificial Intelligence

How AReaL v1.0 Enables Scalable Agentic RL on Ascend NPU with AWEX Weight Sync

The new AReaL v1.0 release brings full Ascend NPU support, detailed installation guides, and a best‑practice example for training a 30B MoE model across four nodes, while the integrated AWEX weight‑sync mechanism dramatically reduces synchronization time, improving efficiency and stability for large‑scale Agentic RL workloads.

AWEXAgentic RLAscend NPU

0 likes · 12 min read

How AReaL v1.0 Enables Scalable Agentic RL on Ascend NPU with AWEX Weight Sync

Machine Learning Algorithms & Natural Language Processing

Mar 28, 2026 · Artificial Intelligence

A Comprehensive Guide to LLM Post‑Training: From RLHF and GRPO to Agentic RL

This article systematically explains the post‑training pipeline for large language models, covering supervised fine‑tuning, RLHF, PPO, GRPO, RLVR, DPO and emerging Agentic RL, while illustrating each method with analogies, detailed workflows, tables, and recent research findings.

Agentic RLDPOGRPO

0 likes · 24 min read

A Comprehensive Guide to LLM Post‑Training: From RLHF and GRPO to Agentic RL

Alibaba Cloud Infrastructure

Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Agentic RLArgo WorkflowsScalable Reinforcement Learning

0 likes · 10 min read

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Machine Learning Algorithms & Natural Language Processing

Mar 6, 2026 · Artificial Intelligence

Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It

Recent studies reveal that in Agentic RL, jointly training reasoning and tool-use on shared parameters creates a persistent negative interaction, with gradients nearly orthogonal, limiting performance; a disentangled tuning approach (DART) using separate LoRA adapters isolates the two abilities and restores gains across benchmarks.

Agentic RLDARTGradient Interference

0 likes · 12 min read

Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It

Baobao Algorithm Notes

Feb 24, 2026 · Artificial Intelligence

The Bitter Lesson of Building Agentic RL in Terminal Environments

This article recounts the challenges of moving from single‑step RL with verifiable rewards to multi‑step agentic reinforcement learning in terminal environments, detailing infrastructure design, asynchronous pipelines, data quality checks, masking strategies, curriculum training, chunk‑based optimization, and practical lessons learned from large‑scale experiments.

Agentic RLCredit AssignmentEnvironment Augmentation

0 likes · 33 min read

The Bitter Lesson of Building Agentic RL in Terminal Environments

Old Zhang's AI Learning

Feb 19, 2026 · Artificial Intelligence

Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance

The article dissects GLM-5’s 744B‑parameter MoE design, 28.5 T token training corpus, novel Muon Split and MLA‑256 optimizations, DSA sparse attention, a fully asynchronous RL pipeline, extensive domestic chip adaptation, and benchmark results that place it on par with Claude Opus 4.5 and ahead of Gemini 3 Pro.

AI ArchitectureAgentic RLDSA

0 likes · 13 min read

Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance

Baobao Algorithm Notes

Feb 4, 2026 · Artificial Intelligence

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

This comprehensive guide walks through reinforcement learning fundamentals, MDP modeling, value functions, Bellman equations, and key algorithms such as Q‑learning, REINFORCE, PPO, DPO, and GRPO, then contrasts LLM‑RL with Agentic‑RL and surveys leading industry frameworks and real‑world applications.

Agentic RLArtificial IntelligenceLLM

0 likes · 42 min read

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

Alimama Tech

Jan 7, 2026 · Artificial Intelligence

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

This article examines the limitations of Vibe Coding for large AI infrastructure, proposes a text‑driven, document‑centric workflow, and presents a time‑multiplexed GPU scheduling solution that dramatically improves rollout throughput and reduces timeouts in large‑scale Agentic RL training.

Agentic RLDesign DocumentsGPU Scheduling

0 likes · 21 min read

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

Baobao Algorithm Notes

Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Agentic RLAsynchronous InferenceRL Systems

0 likes · 18 min read

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

AntTech

Dec 18, 2025 · Artificial Intelligence

How AEnvironment Powers Scalable Agentic RL with a Unified MCP Protocol

AEnvironment is an open‑source, unified environment platform for Agentic Reinforcement Learning that abstracts all resources as services via the MCP protocol, enabling trillion‑scale model training, rapid app generation, benchmark integration, and seamless deployment through a high‑performance ASandbox runtime.

AEnvironmentAgentic RLEnvironment Platform

0 likes · 11 min read

How AEnvironment Powers Scalable Agentic RL with a Unified MCP Protocol

Data Party THU

Sep 15, 2025 · Artificial Intelligence

Agentic RL: Transforming LLMs into Autonomous Decision‑Making Agents

This survey formalizes the shift from preference‑based reinforcement fine‑tuning to Agentic Reinforcement Learning, defines Agentic RL via MDP/POMDP abstractions, proposes a dual taxonomy of capabilities and task domains, compiles over 500 recent works, and outlines open challenges for scalable, robust AI agents.

AI agentsAgentic RLLLM

0 likes · 12 min read

Agentic RL: Transforming LLMs into Autonomous Decision‑Making Agents