Artificial Intelligence 11 min read

Boost Large Model Performance with PAI‑ChatLearn: A High‑Performance RL Framework

PAI‑ChatLearn is a flexible, easy‑to‑use, high‑efficiency reinforcement‑learning framework on Alibaba Cloud’s AI platform that addresses usability and performance challenges of post‑training large models through features like Ray‑based scheduling, dynamic batchsize, sequence packing, MoE acceleration, and provides step‑by‑step guidance for deploying RL tasks such as Qwen‑3 on PAI‑DLC.

Alibaba Cloud Big Data AI Platform

Jul 9, 2025

Boost Large Model Performance with PAI‑ChatLearn: A High‑Performance RL Framework

Introduction

Post‑Training, also known as model post‑training, is a crucial stage for deploying large models because it can significantly improve performance and adapt models to specific domains. Compared with Pre‑Training, Post‑Training requires fewer computational and data resources, making it easier to iterate.

Alibaba Cloud’s AI platform PAI will systematically share technical practices in reinforcement learning, model distillation, data preprocessing, and SFT, demonstrating PAI’s capabilities throughout the Post‑Training pipeline.

Challenges of Reinforcement Learning in Post‑Training

Usability Challenges

Scalability when new algorithms appear, such as on‑policy vs off‑policy, PPO vs GRPO, and whether a critic model is needed.

Rapid support for new model types, e.g., multimodal large models.

Performance Challenges

Sequential execution of RL inference and training leads to GPU idle time.

Uneven response lengths among workers cause GPU under‑utilization, especially for long‑tail sequences.

Scheduling and memory management for multiple models (policy, critic, reward) affect training efficiency.

Performance optimization for distributed MoE training/inference.

PAI‑ChatLearn Technical Features

PAI‑ChatLearn is a self‑developed, flexible, easy‑to‑use, and efficient large‑scale reinforcement‑learning framework.

Flexible and Easy‑to‑Use Framework

Uses Ray as the underlying scheduler with a highly customizable modular design, allowing custom synchronous/asynchronous scheduling, backend selection, and efficient parameter synchronization.

Supports multi‑Actor configurations for fine‑grained workload control, such as model colocate.

Integrates common training frameworks (Megatron‑core, FSDP) and inference frameworks (vLLM, SGlang) with model‑specific parallel strategies to improve GPU utilization.

Supports a rich set of RL algorithms (RLHF, DPO, PPO, GRPO) and allows users to define custom computation graphs for data generation and training.

Extreme Computational Performance

PAI‑ChatLearn implements load‑balancing techniques such as Dynamic Batchsize, Sequence Packing, Sequence Parallel, and Partial Rollout, as well as MoE acceleration methods like GroupGemm and DeepEP, dramatically increasing GPU utilization for RL workloads.

Dynamic Batchsize + Sequence Packing

RL training samples generated by model rollouts have varying lengths. Traditional approaches pad all samples to the same length, wasting compute on padding tokens. PAI‑ChatLearn reorganizes samples within a minibatch so that lengths are similar, maximizing compute efficiency and throughput.

Sequence Parallel

For extremely long sequences that cannot fit on a single GPU, PAI‑ChatLearn adopts Ulysses Sequence Parallel to evenly distribute a sample’s computation across multiple GPUs, preventing OOM and enabling continued training.

GroupGemm

When training MoE models with a large number of experts, the traditional sequential MLP implementation becomes inefficient. PAI‑ChatLearn rewrites the MoE kernel to support a high‑performance GroupGemm operator, making MoE training feasible under FSDP.

Partial Rollout

To avoid long‑tail issues in vLLM forward passes, PAI‑ChatLearn truncates overly long sequences and continues generation in the next forward step.

Measured Results

Compared with open‑source frameworks, PAI‑ChatLearn shows clear advantages in both scale and performance.

Using the Qwen3 model, PAI‑ChatLearn achieves higher end‑to‑end acceleration than the open‑source VeRL framework.

For LLaMA2 Dense models, PAI‑ChatLearn also outperforms other open‑source solutions.

Using PAI‑ChatLearn on PAI

The PAI platform’s cloud‑native AI training module PAI‑DLC (Deep Learning Containers) provides a flexible, stable, easy‑to‑use, and high‑performance training environment that supports various algorithm frameworks and large‑scale distributed deep‑learning tasks.

PAI‑DLC allows one‑click submission of PAI‑ChatLearn RL tasks. The following example demonstrates the workflow with the Qwen3 model.

Prepare the Qwen3 Model

modelscope download --model Qwen/Qwen3-8B --local_dir Qwen3-8B

Prepare the Training Dataset

This example uses the MATH‑lighteval dataset, a math reasoning benchmark with a rule‑based reward function.

# Download dataset
mkdir -p dataset
modelscope download --dataset AI-ModelScope/MATH-lighteval --local_dir dataset/MATH-lighteval
# Preprocess dataset
python examples/fsdp/data/data_preprocess/math_lighteval.py --input_dir dataset/MATH-lighteval --local_dir dataset/MATH-lighteval

Submit the Training Task

After local debugging, configure a distributed multi‑GPU job in the DLC environment.

Image address:

dsw-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.5.1-vllm0.6.6-ubuntu22.04-cuda12.6-py310

Start command:

cd /mnt/data/ChatLearn && bash examples/fsdp/scripts/train_grpo_qwen3.sh

cd /mnt/data/ChatLearn && bash examples/fsdp/scripts/train_grpo_qwen3.sh

For custom tasks, refer to examples/fsdp/models/rule_reward.py to implement a custom reward function.

For further details, see the official documentation links provided in the original article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large models AI platform PAI-ChatLearn

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.