Tagged articles

Scalable Training

4 articles · Page 1 of 1

Nov 26, 2025 · Artificial Intelligence

How Alibaba’s ROCK & ROLL Enable Scalable Agentic AI Training

Alibaba’s open‑source ROCK environment sandbox and the ROLL reinforcement‑learning engine together provide a standardized, high‑throughput training loop that lets developers scale Agentic AI from a single machine to thousands of parallel instances while simplifying debugging and resource management.

Scalable Trainingagentic AIinfrastructure

0 likes · 12 min read

How Alibaba’s ROCK & ROLL Enable Scalable Agentic AI Training

Data Party THU

Nov 4, 2025 · Artificial Intelligence

Why Evolution Strategies Beat Reinforcement Learning for Large‑Model Fine‑Tuning

This article reviews the paper “Evolution Strategies at Scale: LLM Fine‑Tuning Beyond Reinforcement Learning”, explaining how parameter‑space exploration via ES provides more stable, sample‑efficient, and reproducible fine‑tuning for billion‑parameter LLMs such as Qwen‑2.5 and LLaMA‑3, and detailing the algorithmic and engineering innovations that make full‑parameter ES practical.

Evolution StrategiesParameter Space OptimizationScalable Training

0 likes · 15 min read

Why Evolution Strategies Beat Reinforcement Learning for Large‑Model Fine‑Tuning

Alimama Tech

Jun 25, 2025 · Artificial Intelligence

Introducing ROLL: A Scalable, User‑Friendly RL Framework for Large‑Scale LLM Training

ROLL is an open‑source reinforcement‑learning framework designed for large language model post‑training that combines multi‑task RL, agentic support, flexible algorithm configuration, elastic resource scheduling, and rich observability, delivering significant accuracy gains across benchmarks while remaining easy to use for researchers, product developers, and infrastructure engineers.

AI FrameworkLarge Language ModelsOpen-source

0 likes · 11 min read

Introducing ROLL: A Scalable, User‑Friendly RL Framework for Large‑Scale LLM Training

DataFunTalk

Oct 20, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Practices, and Optimizations

This article describes how Du Xiaoman tackled the high cost, instability, and long cycles of AI algorithm deployment by building the ATLAS automated machine learning platform, detailing its four‑stage workflow, component platforms, scaling and efficiency techniques, and practical Q&A for practitioners.

AI DeploymentAutoMLMachine Learning Platform

0 likes · 22 min read

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Practices, and Optimizations