How to Efficiently Fine‑Tune Qwen LLMs on Alibaba Cloud PAI Lingjun
This guide walks you through setting up Alibaba Cloud PAI Lingjun resources, preparing Qwen‑7B/14B/72B models, preprocessing large‑scale WuDao data, configuring distributed training with Megatron‑LM, performing continued pre‑training and supervised fine‑tuning, and finally deploying the model as an online service via PAI‑EAS.
Introduction
On December 1, Alibaba released four open‑source Qwen models (1.8B, 7B, 14B, 72B). The PAI Lingjun intelligent computing service provides heterogeneous compute and an AI engineering platform. This practice demonstrates how to use PAI Lingjun for efficient distributed continued pre‑training, instruction fine‑tuning, offline inference verification, and online service deployment of Qwen models.
Resource and Environment Configuration
Refer to the official documentation to create and manage PAI Lingjun resources.
Resource and configuration recommendations (model size → required training and inference resources):
7B: 8×V100‑32G or 1×A10‑22G, TP1 PP1
14B: 2×V100‑32G or 2×A10‑22G, TP2 PP1
72B: 6×V100‑32G + 2×gu7xf, TP8 PP2
LLM Unified Image
Set the unified image URL in the custom image field:
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llmPAI‑DSW Interactive Development
DSW provides Jupyter, WebIDE, and Terminal for data processing and single‑node multi‑GPU debugging. Create a DSW instance, mount /mnt/workspace/ for datasets and training code, and ensure the following resource limits:
Memory ≥ 1024 GB
CPU cores ≤ 96
Shared memory = memory
GPU cards ≥ 8
PAI‑DLC Distributed Task Configuration
DLC runs multi‑node multi‑GPU jobs. Create a DLC instance, fill the task name, resource group, and unified image URL.
Model Preparation
Download Qwen models from ModelScope, HuggingFace, or OSS:
# pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/</code><code># pip install modelscope</code><code>pip install modelscope # Loading Model and Tokenizer</code><code>from modelscope.hub.snapshot_download import snapshot_download</code><code>model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4')</code><code>print(model_dir) mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf</code><code>cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hf git clone https://github.com/alibaba/Pai-Megatron-Patch.gitData Preparation (WuDao 2.0)
Download and extract the sample WuDao corpus:
wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz</code><code>tar zxvf WuDaoCorpus2.0_base_sample.tgzRun the provided bash script to clean, merge, split, and compress the data into ZST files for efficient loading.
#! /bin/bash</code><code>set -ex</code><code>data_dir=/mnt/workspace/qwen-datasets/wudao_200g</code><code># data cleaning</code><code>... (script omitted for brevity) ...Megatron‑LM Data Preprocessing
Generate MMAP format datasets using the run_make_pretraining_dataset.sh script:
export dataset_dir=/mnt/workspace/qwen-datasets</code><code>export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing</code><code>bash run_make_pretraining_dataset.sh \</code><code>../../Megatron-LM-23.04 \</code><code>${WORK_DIR}/Pai-Megatron-Patch/ \</code><code>${dataset_dir}/cleaned_zst/ \</code><code>qwenbpe \</code><code>${dataset_dir}/wudao/ \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hfSmall‑Scale Preprocessed Data (Optional)
Download ready‑made small datasets for quick testing:
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json</code><code>wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.jsonContinued Pre‑Training
Run run_pretrain_megatron_qwen.sh on DSW or DLC. Example for DSW:
export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen</code><code>sh run_pretrain_megatron_qwen.sh \</code><code>dsw \</code><code>${WORK_DIR}/Pai-Megatron-Patch \</code><code>7B \</code><code>1 \</code><code>8 \</code><code>1e-5 \</code><code>1e-6 \</code><code>2048 \</code><code>2048 \</code><code>85 \</code><code>fp16 \</code><code>1 \</code><code>1 \</code><code>sel \</code><code>true \</code><code>false \</code><code>false \</code><code>false \</code><code>100000 \</code><code>${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \</code><code>100000000 \</code><code>10000 \</code><code>${WORK_DIR}/output_megatron_qwen/Supervised Fine‑Tuning
After obtaining the small‑scale JSON data, run run_finetune_megatron_qwen_withGA.sh:
export WORK_DIR=/mnt/workspace</code><code>cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen</code><code>sh run_finetune_megatron_qwen_withGA.sh \</code><code>dsw \</code><code>${WORK_DIR}/Pai-Megatron-Patch \</code><code>7B \</code><code>1 \</code><code>96 \</code><code>1e-5 \</code><code>1e-6 \</code><code>2048 \</code><code>2048 \</code><code>85 \</code><code>bf16 \</code><code>1 \</code><code>1 \</code><code>sel \</code><code>true \</code><code>false \</code><code>false \</code><code>false \</code><code>1000 \</code><code>${WORK_DIR}/qwen-datasets/wudao_train.json \</code><code>${WORK_DIR}/qwen-datasets/wudao_valid.json \</code><code>${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \</code><code>2000 \</code><code>10 \</code><code>${WORK_DIR}/output_megatron_qwen/Model Format Conversion
Convert Megatron checkpoints to HuggingFace format for inference:
sh model_convertor.sh \</code><code>../../../Megatron-LM-main \</code><code>${WORK_DIR}/output_megatron_qwen/checkpoint/${PATH}/iter_0001000 \</code><code>/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/ \</code><code>1 \</code><code>1 \</code><code>qwen-7b \</code><code>0 \</code><code>trueOffline Inference
Use HuggingFace or Megatron‑LM pipelines. Example HuggingFace script:
#!/usr/bin/env python</code><code>#encoding=utf-8</code><code>from transformers import AutoTokenizer, AutoModelForCausalLM</code><code>import torch</code><code>checkpoint = '/mnt/workspace/latest/qianwen/qwen-7b-hf'</code><code>tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)</code><code>model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True)</code><code>prompt = "Human:写一个快速排序算法"</code><code>inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)</code><code>outputs = model.generate(inputs, max_new_tokens=512)</code><code>print(tokenizer.decode(outputs[0]))Online Service Deployment (PAI‑EAS)
Upload the final model to OSS, create an EAS resource group, and deploy via the console or eascmd. Example console deployment uses the image
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/llm-inference:vllm-0.2.1-v4and runs:
nohup python -m fastchat.serve.controller > tmp1.log 2>&1 & python -m fastchat.serve.gradio_web_server_pai --model-list-mode reload > tmp2.log 2>&1 & python -m fastchat.serve.vllm_worker --model-path /mnt/model/qwen_7b --tensor-parallel-size 1 --trust-remote-codeAfter deployment, access the WebUI to interact with the model.
Related Resources
Qwen model series: https://modelscope.cn/organization/qwen
PAI Lingjun service: https://www.aliyun.com/product/bigdata/learn/pai/lingjun
PAI‑Megatron‑Patch GitHub: https://github.com/alibaba/Pai-Megatron-Patch
PAI‑EAS product: https://www.aliyun.com/product/bigdata/learn/eas
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
