Artificial Intelligence 14 min read

Master Post-Training: Fine-Tune LLMs with SFT, DPO, and GRPO on Alibaba PAI

This article explains post‑training concepts, compares SFT, DPO, and GRPO fine‑tuning methods, and provides step‑by‑step guidance for using Alibaba Cloud's PAI platform—including Model Gallery and DSW—to fine‑tune large language models with code examples and practical tips.

Alibaba Cloud Big Data AI Platform

Jul 16, 2025

Master Post-Training: Fine-Tune LLMs with SFT, DPO, and GRPO on Alibaba PAI

Introduction

Post‑Training (model post‑training) is a crucial stage for deploying large models, allowing significant performance optimization with lower computational and data requirements compared with pre‑training.

Common Fine‑tuning Methods

Model fine‑tuning adapts a pretrained LLM to specific tasks. Typical approaches include Supervised Fine‑Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO).

SFT

SFT continues training a pretrained model using labeled task data, either updating all parameters (Full Fine‑tuning, FFT) or only a subset (Parameter‑Efficient Fine‑tuning, PEFT) such as LoRA or QLoRA.

Full Fine‑tuning updates every parameter and is resource‑intensive.

PEFT updates only part of the parameters, offering faster training and lower resource consumption. LoRA modifies the self‑attention weight matrix with low‑rank decomposition; QLoRA combines LoRA with 4‑bit/8‑bit quantization to further reduce memory usage.

DPO

DPO aligns model outputs with human preferences without a separate reward model or reinforcement‑learning loop, using a simple classification loss. It provides stability, strong performance, and lower computational cost compared with RLHF.

GRPO

GRPO optimizes relative preferences among a set of candidate answers, eliminating the need for a value model and using group‑based baseline rewards. It directly incorporates KL divergence into the loss, improving efficiency for tasks like mathematical reasoning.

Fine‑tuning Algorithm Comparison

The three algorithms differ in difficulty and suitable scenarios; a typical workflow is SFT followed by DPO to combine domain capability with preference alignment.

PAI Model Fine‑tuning Practice

Alibaba Cloud AI platform PAI offers a full suite of fine‑tuning capabilities through three product lines:

PAI‑Model Gallery

Provides zero‑code fine‑tuning, model compression, evaluation, and deployment. Users select a base model, configure training parameters, and submit a task.

[
    {"instruction":"你是一个心血管科医生，请根据患者的问题给出建议：我患高血压五六年啦，天天喝药吃烦啦，哪种东西能根治高血压，高血压克星是什么？","output":"高血压的患者可以吃许多新鲜的水果蔬菜..."},
    {"instruction":"你是一个呼吸科医生，请根据患者的问题给出建议：风寒感冒咳白痰怎么治疗？","output":"风寒感冒，咳有白痰的患者..."}
]

Training hyper‑parameters can be referenced in the documentation.

PAI‑DSW

Interactive cloud IDE for developers familiar with Python and notebooks. Example workflow for fine‑tuning Qwen2.5‑7B using Pai‑Megatron‑Patch:

cd /mnt/data/yy/qwen25_sft
mkdir qwen-ckpts
cd qwen-ckpts
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
...
sh run_mcore_qwen.sh dsw 7B 1 8 1e-5 1e-6 128 128 bf16 1 1 1 true true true true false false 100 /mnt/data/.../mmap_qwen2_sft_datasets_text_document /mnt/data/.../mmap_qwen2_sft_datasets_text_document /mnt/data/.../Qwen2.5-7B-to-mcore 1000 100 /mnt/data/.../output_mcore_qwen2.5_finetune

Training logs and resource monitoring can be viewed in the DSW interface. After training, the fine‑tuned model is stored in OSS and can be deployed as an online service with a single click.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Fine-tuning SFT GRPO DPO PAI post-training

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.