Artificial Intelligence 12 min read

How Alibaba’s ROCK & ROLL Enable Scalable Agentic AI Training

Alibaba’s open‑source ROCK environment sandbox and the ROLL reinforcement‑learning engine together provide a standardized, high‑throughput training loop that lets developers scale Agentic AI from a single machine to thousands of parallel instances while simplifying debugging and resource management.

Alimama Tech

Nov 26, 2025

How Alibaba’s ROCK & ROLL Enable Scalable Agentic AI Training

Scalable training environments for Agentic AI

Large language models are evolving into agents that must interact with external tools, run code, and call APIs. Training such agents requires a high‑performance, reproducible sandbox environment that can be elastically scaled.

ROCK – Reinforcement Open Construction Kit

ROCK is an open‑source platform that turns a cluster of machines into an elastic pool of sandboxed environments. Each sandbox runs in an isolated container and provides programmatic Bash access via an SDK and HTTP API. Key capabilities:

Elastic scaling: launch thousands of sandbox instances in minutes.

Fault isolation: a crash in one sandbox does not affect others.

Fine‑grained resource scheduling to eliminate noisy‑neighbor contention.

Fast state management: crashed environments are restored within seconds.

Programmatic Bash interaction: developers can query files, logs, processes, or modify environment variables through the SDK.

ROLL – High‑performance RL engine

ROLL is built on Ray and targets large‑scale LLM reinforcement learning. It integrates Megatron‑Core and DeepSpeed with 5‑dimensional parallelism and supports multi‑domain tasks, native Agentic RL, and asynchronous rollout with redundant sampling. The engine defines a minimal interface called GEM :

env.reset()
env.step(action)  # returns observation, reward, done, info

Implementing only reset and step is sufficient to plug any task—games, tool‑calling, or code execution—into ROLL.

Asynchronous rollout and redundant sampling accelerate training throughput.

Standardized interface enables seamless integration of new environments.

ModelService – Decoupling agents from the RL engine

ModelService runs inside ROCK and acts as a middle‑man between the agent and ROLL. The interaction follows three steps:

Agent constructs a prompt in the sandbox and sends a request.

ModelService intercepts the request and forwards the raw prompt to ROLL via a reverse channel.

ROLL performs inference, computes rewards, updates the policy, and returns the answer to ModelService, which then delivers it back to the agent.

Benefits:

Complete decoupling: agents only issue queries, ROLL only provides answers.

Training authority remains with ROLL, preserving control over reward computation.

Cost efficiency: GPU‑intensive inference stays in ROLL, while ROCK sandboxes run on inexpensive CPU instances.

Broad compatibility with custom agents.

End‑to‑end workflow

Developers can start with local sandbox testing, then use ROLL’s one‑click integration to launch ROCK sandboxes for full training runs, and finally deploy the same code to a cloud cluster without any configuration changes. This eliminates “works on my machine” issues and supports long‑running training jobs.

Open‑source repositories

ROCK and ROLL are released under open‑source licenses. The source code is available at:

https://github.com/alibaba/ROCK

https://github.com/alibaba/ROLL

Quick‑start documentation guides users through the first agent training in about five minutes: https://alibaba.github.io/ROCK/docs/Getting%20Started/rockroll/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning Infrastructure agentic AI Scalable Training

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.