Artificial Intelligence 11 min read

How PAI-ChatLearn Accelerates Large‑Scale LLM Alignment Training

PAI-ChatLearn is an open‑source framework that abstracts and decouples alignment training for large language models, offering flexible resource scheduling, multi‑backend support, and significant speedups—up to 208% for 70B models—while supporting RLHF, DPO, and custom training flows.

Alibaba Cloud Big Data AI Platform

Aug 29, 2024

How PAI-ChatLearn Accelerates Large‑Scale LLM Alignment Training

Background

ChatGPT, developed by OpenAI, became popular due to its impressive conversational abilities, which rely on the RLHF (Reinforcement Learning from Human Feedback) training paradigm. Unlike traditional deep‑learning training that optimizes a single model, RLHF involves multiple large models and extensive data interaction, creating challenges for building an easy‑to‑use, efficient, and scalable training system.

Introducing PAI‑ChatLearn

PAI‑ChatLearn is an open‑source alignment training framework from Alibaba Cloud PAI. It abstracts and decouples the alignment training workflow, providing flexible resource allocation and model scheduling strategies. The framework supports RLHF, DPO, Online‑DPO, GRPO, and allows users to define custom model execution flows.

Performance improvements include a 115% speedup for 7B+7B (Policy+Reward) configurations and a 208% speedup for 70B+70B configurations, with support for even larger scales such as 300B+300B. It also supports Qwen series models, achieving strong results on Qwen‑Chat, Qwen2‑Chat, and Qwen2‑Math.

Main Features

Easy‑to‑use programming interface: Users only need to implement a few functions to construct models; the system handles resource scheduling, data flow, control flow, and distributed execution.

Highly extensible training methods: Supports RLHF, DPO, Online‑DPO, GRPO and custom model flows.

Multiple distributed acceleration engines: Compatible with Megatron‑LM, DeepSpeed, vLLM, and can combine them for training and inference.

Flexible parallel strategies and resource allocation: Different models can use distinct parallel strategies; resources can be dedicated, shared, or partially reused to maximize efficiency.

High performance: Compared with state‑of‑the‑art systems, ChatLearn delivers 115% and 208% speedups for 7B and 70B scales respectively, and scales up to 300B+300B.

Technical Architecture

API: Provides abstractions for RLHF, DPO, Online‑DPO, GRPO and custom flows. Users inherit from MegatronModule, DeepSpeedModule, or VLLMModule to wrap different backends. Configuration is done via YAML files for hyper‑parameters and parallel strategies.

Scheduler: Introduces the DistActor abstraction, extending Ray actors to support cross‑machine execution. It partitions cluster resources into Resource Groups and applies hardware‑aware affinity scheduling, enabling exclusive or shared resource usage.

Executor: Divides the alignment workflow into Environment, Trainer, and Evaluator modules, handling data transfer, parameter synchronization, and model evaluation.

Backend: Thanks to the clean programming interface, users can easily plug in various backends for computation and algorithmic optimizations.

Optimization: Supports compute, memory, and communication optimizations such as paged attention, continuous batching, Efficient Memory Sharing (EMS), and grouped broadcast for efficient parameter synchronization.

Performance and Results

Comparisons with DeepSpeed‑Chat and OpenRLHF show a 115% throughput increase for 7B+7B on 8 GPUs and a 208% increase for 70B+70B on 32 GPUs. Larger scales (e.g., 300B+300B) also benefit from the framework.

Qwen2‑72B trained with Online‑DPO using ChatLearn achieves leading performance among open‑source models.

Qwen2‑Math‑Instruct trained with GRPO also outperforms existing models.

Roadmap

Support Megatron‑mcore format models.

Support MoE model alignment training.

Extend support to more model families.

Conclusion

PAI‑ChatLearn provides a flexible, high‑performance, and open‑source solution for large‑scale alignment training of LLMs. It abstracts the training workflow, offers resource‑aware scheduling, and delivers substantial speedups over existing SOTA systems. Ongoing development will add more model support, backend integrations, and further performance optimizations.

Open‑source repository: https://github.com/alibaba/ChatLearn

Documentation: Chinese https://chatlearn.readthedocs.io/zh-cn/latest/ , English https://chatlearn.readthedocs.io/en/latest/

References

https://arxiv.org/pdf/2407.10671

https://qwenlm.github.io/blog/qwen2-math/

Megatron‑LM: https://github.com/NVIDIA/Megatron-LM

DeepSpeed‑Chat: https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat

OpenRLHF: https://github.com/OpenRLHF/OpenRLHF

open-source RLHF AI performance ChatLearn LLM alignment

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.