Boost Post‑Training Efficiency with Cosmos‑RL, Ray, and VeRL on Alibaba PAI
This article introduces Alibaba Cloud's PAI platform and demonstrates how open‑source reinforcement‑learning frameworks such as Cosmos‑RL, Ray, and VeRL accelerate post‑training for large language models, offering higher throughput, fault‑tolerance, and seamless integration for AI developers.
Introduction
Post‑Training, the stage after large‑model pre‑training, can significantly improve model performance while requiring fewer computational and data resources, making it attractive for domain‑specific adaptation.
Alibaba Cloud’s AI platform PAI showcases practical techniques in reinforcement learning, model distillation, data preprocessing, and supervised fine‑tuning (SFT), providing clear product capabilities and usage methods.
Reinforcement‑Learning Frameworks on PAI
Cosmos‑RL
Cosmos‑RL, an NVIDIA‑provided asynchronous, highly robust LLM reinforcement‑learning framework, improves training efficiency and fault tolerance compared to traditional colocated frameworks (e.g., VeRL, OpenRLHF). It separates policy training and rollout inference via heterogeneous deployment and a controller‑based scheduler, achieving 2‑3× faster training while maintaining accuracy.
Its topology management allows rapid re‑networking when any node fails, enabling continued training without restart. Multi‑controller backups further enhance stability and support dynamic scaling.
Ray
Ray is an open‑source distributed computing framework that integrates multiple AI libraries, including Ray Tune, Ray RLlib, Ray Serve, and RaySGD, offering a comprehensive AI solution. It powers large‑scale model training such as ChatGPT.
PAI‑DLC provides a native Ray experience: users can submit existing Ray scripts with a single click, benefiting from serverless execution, automatic cluster management, and fault‑tolerant engines (head‑node self‑healing, intelligent diagnostics, rapid log‑based failure resolution).
VeRL
VeRL, an open‑source reinforcement‑learning and large‑model alignment framework from ByteDance, adopts a hybrid programming model that decouples control and computation flows, supporting asynchronous control and distributed execution.
Performance Evaluation
Benchmarks using Qwen2.5‑32B‑Instruct on the GSM8K dataset show Cosmos‑RL’s throughput scaling 2‑3× higher than VeRL as GPU count increases.
PAI‑DLC Platform Overview
PAI‑DLC is a cloud‑native AI distributed training platform offering:
Powerful distributed computing : unified scheduling engine with topology‑aware, FIFO, balanced queuing, and multi‑level quota sharing, achieving >90% overall utilization.
Multi‑framework support : one‑click launch for Megatron, DeepSpeed, PyTorch, MPI, Slurm, and others, eliminating cluster setup.
Enterprise‑grade fault tolerance : AIMaster elastic fault‑tolerance engine, node self‑healing, and EasyCKPT for rapid checkpointing.
Conclusion
Open‑source reinforcement‑learning frameworks provide flexible algorithms, rich toolsets, and active community ecosystems, enabling low‑cost experimentation and extending cloud platform capabilities. Alibaba Cloud PAI‑DLC addresses performance and stability bottlenecks at scale, offering a one‑stop service that lowers development barriers and accelerates large‑model applications, contributing to the advancement of AGI.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
