Boost Post‑Training Efficiency with Cosmos‑RL, Ray, and VeRL on Alibaba PAI

This article introduces Alibaba Cloud's PAI platform and demonstrates how open‑source reinforcement‑learning frameworks such as Cosmos‑RL, Ray, and VeRL accelerate post‑training for large language models, offering higher throughput, fault‑tolerance, and seamless integration for AI developers.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Boost Post‑Training Efficiency with Cosmos‑RL, Ray, and VeRL on Alibaba PAI

Introduction

Post‑Training, the stage after large‑model pre‑training, can significantly improve model performance while requiring fewer computational and data resources, making it attractive for domain‑specific adaptation.

Alibaba Cloud’s AI platform PAI showcases practical techniques in reinforcement learning, model distillation, data preprocessing, and supervised fine‑tuning (SFT), providing clear product capabilities and usage methods.

Reinforcement‑Learning Frameworks on PAI

Cosmos‑RL

Cosmos‑RL, an NVIDIA‑provided asynchronous, highly robust LLM reinforcement‑learning framework, improves training efficiency and fault tolerance compared to traditional colocated frameworks (e.g., VeRL, OpenRLHF). It separates policy training and rollout inference via heterogeneous deployment and a controller‑based scheduler, achieving 2‑3× faster training while maintaining accuracy.

Its topology management allows rapid re‑networking when any node fails, enabling continued training without restart. Multi‑controller backups further enhance stability and support dynamic scaling.

Ray

Ray is an open‑source distributed computing framework that integrates multiple AI libraries, including Ray Tune, Ray RLlib, Ray Serve, and RaySGD, offering a comprehensive AI solution. It powers large‑scale model training such as ChatGPT.

PAI‑DLC provides a native Ray experience: users can submit existing Ray scripts with a single click, benefiting from serverless execution, automatic cluster management, and fault‑tolerant engines (head‑node self‑healing, intelligent diagnostics, rapid log‑based failure resolution).

VeRL

VeRL, an open‑source reinforcement‑learning and large‑model alignment framework from ByteDance, adopts a hybrid programming model that decouples control and computation flows, supporting asynchronous control and distributed execution.

Performance Evaluation

Benchmarks using Qwen2.5‑32B‑Instruct on the GSM8K dataset show Cosmos‑RL’s throughput scaling 2‑3× higher than VeRL as GPU count increases.

PAI‑DLC Platform Overview

PAI‑DLC is a cloud‑native AI distributed training platform offering:

Powerful distributed computing : unified scheduling engine with topology‑aware, FIFO, balanced queuing, and multi‑level quota sharing, achieving >90% overall utilization.

Multi‑framework support : one‑click launch for Megatron, DeepSpeed, PyTorch, MPI, Slurm, and others, eliminating cluster setup.

Enterprise‑grade fault tolerance : AIMaster elastic fault‑tolerance engine, node self‑healing, and EasyCKPT for rapid checkpointing.

Conclusion

Open‑source reinforcement‑learning frameworks provide flexible algorithms, rich toolsets, and active community ecosystems, enabling low‑cost experimentation and extending cloud platform capabilities. Alibaba Cloud PAI‑DLC addresses performance and stability bottlenecks at scale, offering a one‑stop service that lowers development barriers and accelerates large‑model applications, contributing to the advancement of AGI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed TrainingAI PlatformOpen Source Frameworkspost-training
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.