Artificial Intelligence 27 min read

How to Efficiently Fine‑Tune Qwen LLMs on Alibaba Cloud PAI Lingjun

This guide walks you through setting up Alibaba Cloud PAI Lingjun resources, preparing Qwen‑7B/14B/72B models, preprocessing large‑scale WuDao data, configuring distributed training with Megatron‑LM, performing continued pre‑training and supervised fine‑tuning, and finally deploying the model as an online service via PAI‑EAS.

Alibaba Cloud Big Data AI Platform

Dec 5, 2023

How to Efficiently Fine‑Tune Qwen LLMs on Alibaba Cloud PAI Lingjun

Introduction

On December 1, Alibaba released four open‑source Qwen models (1.8B, 7B, 14B, 72B). The PAI Lingjun intelligent computing service provides heterogeneous compute and an AI engineering platform. This practice demonstrates how to use PAI Lingjun for efficient distributed continued pre‑training, instruction fine‑tuning, offline inference verification, and online service deployment of Qwen models.

Resource and Environment Configuration

Refer to the official documentation to create and manage PAI Lingjun resources.

Resource and configuration recommendations (model size → required training and inference resources):

7B: 8×V100‑32G or 1×A10‑22G, TP1 PP1

14B: 2×V100‑32G or 2×A10‑22G, TP2 PP1

72B: 6×V100‑32G + 2×gu7xf, TP8 PP2

LLM Unified Image

Set the unified image URL in the custom image field:

pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm

PAI‑DSW Interactive Development

DSW provides Jupyter, WebIDE, and Terminal for data processing and single‑node multi‑GPU debugging. Create a DSW instance, mount /mnt/workspace/ for datasets and training code, and ensure the following resource limits:

Memory ≥ 1024 GB

CPU cores ≤ 96

Shared memory = memory

GPU cards ≥ 8

PAI‑DLC Distributed Task Configuration

DLC runs multi‑node multi‑GPU jobs. Create a DLC instance, fill the task name, resource group, and unified image URL.

Model Preparation

Download Qwen models from ModelScope, HuggingFace, or OSS:

# pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/</code><code># pip install modelscope</code><code>pip install modelscope

# Loading Model and Tokenizer</code><code>from modelscope.hub.snapshot_download import snapshot_download</code><code>model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4')</code><code>print(model_dir)

mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf</code><code>cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hf

git clone https://github.com/alibaba/Pai-Megatron-Patch.git

Data Preparation (WuDao 2.0)

Download and extract the sample WuDao corpus:

wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz</code><code>tar zxvf WuDaoCorpus2.0_base_sample.tgz

Run the provided bash script to clean, merge, split, and compress the data into ZST files for efficient loading.

#! /bin/bash</code><code>set -ex</code><code>data_dir=/mnt/workspace/qwen-datasets/wudao_200g</code><code># data cleaning</code><code>... (script omitted for brevity) ...

Megatron‑LM Data Preprocessing

Generate MMAP format datasets using the run_make_pretraining_dataset.sh script:

export dataset_dir=/mnt/workspace/qwen-datasets</code><code>export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing</code><code>bash run_make_pretraining_dataset.sh \</code><code>../../Megatron-LM-23.04 \</code><code>${WORK_DIR}/Pai-Megatron-Patch/ \</code><code>${dataset_dir}/cleaned_zst/ \</code><code>qwenbpe \</code><code>${dataset_dir}/wudao/ \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hf

Small‑Scale Preprocessed Data (Optional)

Download ready‑made small datasets for quick testing:

wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json</code><code>wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.json

Continued Pre‑Training

Run run_pretrain_megatron_qwen.sh on DSW or DLC. Example for DSW:

export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen</code><code>sh run_pretrain_megatron_qwen.sh \</code><code>dsw \</code><code>${WORK_DIR}/Pai-Megatron-Patch \</code><code>7B \</code><code>1 \</code><code>8 \</code><code>1e-5 \</code><code>1e-6 \</code><code>2048 \</code><code>2048 \</code><code>85 \</code><code>fp16 \</code><code>1 \</code><code>1 \</code><code>sel \</code><code>true \</code><code>false \</code><code>false \</code><code>false \</code><code>100000 \</code><code>${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \</code><code>100000000 \</code><code>10000 \</code><code>${WORK_DIR}/output_megatron_qwen/

Supervised Fine‑Tuning

After obtaining the small‑scale JSON data, run run_finetune_megatron_qwen_withGA.sh:

export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen</code><code>sh run_finetune_megatron_qwen_withGA.sh \</code><code>dsw \</code><code>${WORK_DIR}/Pai-Megatron-Patch \</code><code>7B \</code><code>1 \</code><code>96 \</code><code>1e-5 \</code><code>1e-6 \</code><code>2048 \</code><code>2048 \</code><code>85 \</code><code>bf16 \</code><code>1 \</code><code>1 \</code><code>sel \</code><code>true \</code><code>false \</code><code>false \</code><code>false \</code><code>1000 \</code><code>${WORK_DIR}/qwen-datasets/wudao_train.json \</code><code>${WORK_DIR}/qwen-datasets/wudao_valid.json \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \</code><code>2000 \</code><code>10 \</code><code>${WORK_DIR}/output_megatron_qwen/

Model Format Conversion

Convert Megatron checkpoints to HuggingFace format for inference:

sh model_convertor.sh \</code><code>../../../Megatron-LM-main \</code><code>${WORK_DIR}/output_megatron_qwen/checkpoint/${PATH}/iter_0001000 \</code><code>/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/ \</code><code>1 \</code><code>1 \</code><code>qwen-7b \</code><code>0 \</code><code>true

Offline Inference

Use HuggingFace or Megatron‑LM pipelines. Example HuggingFace script:

#!/usr/bin/env python</code><code>#encoding=utf-8</code><code>from transformers import AutoTokenizer, AutoModelForCausalLM</code><code>import torch</code><code>checkpoint = '/mnt/workspace/latest/qianwen/qwen-7b-hf'</code><code>tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)</code><code>model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True)</code><code>prompt = "Human:写一个快速排序算法"</code><code>inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)</code><code>outputs = model.generate(inputs, max_new_tokens=512)</code><code>print(tokenizer.decode(outputs[0]))

Online Service Deployment (PAI‑EAS)

Upload the final model to OSS, create an EAS resource group, and deploy via the console or eascmd. Example console deployment uses the image

pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/llm-inference:vllm-0.2.1-v4

and runs:

nohup python -m fastchat.serve.controller > tmp1.log 2>&1 & python -m fastchat.serve.gradio_web_server_pai --model-list-mode reload > tmp2.log 2>&1 & python -m fastchat.serve.vllm_worker --model-path /mnt/model/qwen_7b --tensor-parallel-size 1 --trust-remote-code

After deployment, access the WebUI to interact with the model.

Related Resources

Qwen model series: https://modelscope.cn/organization/qwen

PAI Lingjun service: https://www.aliyun.com/product/bigdata/learn/pai/lingjun

PAI‑Megatron‑Patch GitHub: https://github.com/alibaba/Pai-Megatron-Patch

PAI‑EAS product: https://www.aliyun.com/product/bigdata/learn/eas

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Megatron-LM Alibaba Cloud PAI

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.