Deploy and Fine‑Tune Alibaba’s Qwen‑72B‑Chat on PAI‑QuickStart
This guide explains how to meet runtime requirements, deploy Qwen‑72B‑Chat via the Alibaba Cloud PAI console, invoke it with cURL or Python SDK, and perform full‑parameter fine‑tuning using Megatron‑LM, providing a complete end‑to‑end workflow for large language model development.
Introduction
Qwen‑72B (Qwen‑72B) is a 720‑billion‑parameter large language model from Alibaba Cloud. Qwen‑72B‑Chat is the chat‑oriented version built with alignment mechanisms. Alibaba Cloud AI Platform PAI provides a full‑stack AI development service, and its QuickStart component bundles popular open‑source models with zero‑code or SDK workflows.
Runtime Requirements
The example runs only in the Ulanqab region on Lingjun clusters.
GPU: GU108 (80 GB) recommended; inference needs ≥4 GPUs, fine‑tuning needs ≥4 machines with 32 GPUs.
Refer to the official PAI Lingjun resource guide for provisioning.
Deploy Model via PAI Console
In the PAI console’s QuickStart entry, locate the Qwen‑72B‑Chat model card (see image). Use the “Model Deployment” page to select Lingjun resources and click Deploy, which creates a PAI‑EAS inference service.
After deployment, the service detail page shows the Endpoint and Token, which can be used to call the HTTP API. Example cURL commands for listing models, text generation, and chat are provided.
# Replace with your Endpoint and Token
export API_ENDPOINT="<ENDPOINT>"
export API_TOKEN="<TOKEN>"
# List models
curl $API_ENDPOINT/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $API_TOKEN"
# Text generation
curl $API_ENDPOINT/v1/completions -H "Content-Type: application/json" -H "Authorization: Bearer $API_TOKEN" -d '{
"model": "qwen-72b-chat",
"prompt": "San Francisco is a",
"max_tokens": 256,
"temperature": 0,
"stop": ["<|im_end|>", "<|im_start|>"]
}'
# Chat
curl $API_ENDPOINT/v1/chat/completions -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" -d '{
"model": "qwen-72b-chat",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "介绍一下上海的历史"}
],
"stop": ["<|im_end|>", "<|im_start|>"]
}'Python SDK usage requires installing the OpenAI SDK and configuring the endpoint and token.
import openai
openai.api_key = "<TOKEN>"
openai.base_url = "<ENDPOINT>/v1"
completion = openai.chat.completions.create(
model="qwen-72b-chat",
temperature=0.0,
top_p=0.8,
messages=[{"role": "user", "content": "请介绍下你自己。"}],
stop=["<|im_end|>", "<|im_start|>"]
)
print(completion.choices[0].message.content)Model Fine‑Tuning
PAI‑QuickStart supports full‑parameter fine‑tuning of Qwen‑72B‑Chat using Megatron‑LM with techniques such as data parallelism, pipeline parallelism, and Zero‑offload. Users prepare training and validation JSON files (each entry contains “instruction” and “output”), upload them to an OSS bucket, and configure hyper‑parameters (learning_rate, sequence_length, etc.) in the console.
After submitting the job, the training status and logs are viewable in the console. Checkpoints are saved to the specified OSS bucket, and any checkpoint can be selected for inference.
Using PAI Python SDK
The SDK allows deploying the model with a few lines of code and retrieving the service Endpoint and Token.
from pai.session import get_default_session
from pai.model import RegisteredModel
session = get_default_session()
m = RegisteredModel(model_name="qwen-72b-chat", model_provider="pai")
predictor = m.deploy(
service_name=f"qwen_72b_chat_{random_str(6)}",
options={
"metadata.quota_id": "<LingJunResourceQuotaId>",
"metadata.quota_type": "Lingjun",
"metadata.workspace_id": session.workspace_id,
}
)
endpoint = predictor.internet_endpoint
token = predictor.access_tokenFurther SDK calls can list model inputs, set hyper‑parameters, and launch fine‑tuning jobs as shown in the earlier sections.
Conclusion
Alibaba Cloud PAI‑QuickStart provides an out‑of‑the‑box experience for deploying and fine‑tuning Qwen‑72B‑Chat, streamlining the AI development workflow and enabling developers and enterprises to accelerate innovation with large language models.
References
PAI QuickStart Overview: https://help.aliyun.com/zh/pai/user-guide/quick-start-overview
Qwen model series: https://modelscope.cn/organization/qwen
PAI Python SDK: https://github.com/aliyun/pai-python-sdk
PAI Lingjun Intelligent Computing Service: https://www.aliyun.com/product/bigdata/learn/pailingjun
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
