Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Agentic RLArgo WorkflowsScalable Reinforcement Learning
0 likes · 10 min read
Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows
Volcano Engine Developer Services
Volcano Engine Developer Services
Oct 14, 2025 · Artificial Intelligence

How CollabLLM Redefines LLM Collaboration with Multi‑Turn Training

CollabLLM tackles the limitations of large language models in everyday multi‑turn dialogues by introducing a user‑centric, multi‑turn training framework that leverages simulated interactions, multi‑round reward modeling, and veRL toolchain support, achieving superior performance over single‑turn baselines.

LLMcollaborative trainingmulti‑turn dialogue
0 likes · 13 min read
How CollabLLM Redefines LLM Collaboration with Multi‑Turn Training
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2025 · Artificial Intelligence

How to Train a 671B‑Scale Model with RL: Insights from a verl Internship

This article shares a detailed, first‑hand analysis of the technical challenges, framework choices, memory management, weight conversion, precision alignment, and efficiency optimizations encountered while building reinforcement‑learning pipelines for a 671‑billion‑parameter model using the verl ecosystem.

GPU memory managementMegatronModel Parallelism
0 likes · 16 min read
How to Train a 671B‑Scale Model with RL: Insights from a verl Internship