Deploy and Fine‑Tune Alibaba’s Qwen‑72B‑Chat on PAI‑QuickStart

This guide explains how to meet runtime requirements, deploy Qwen‑72B‑Chat via the Alibaba Cloud PAI console, invoke it with cURL or Python SDK, and perform full‑parameter fine‑tuning using Megatron‑LM, providing a complete end‑to‑end workflow for large language model development.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Deploy and Fine‑Tune Alibaba’s Qwen‑72B‑Chat on PAI‑QuickStart

Introduction

Qwen‑72B (Qwen‑72B) is a 720‑billion‑parameter large language model from Alibaba Cloud. Qwen‑72B‑Chat is the chat‑oriented version built with alignment mechanisms. Alibaba Cloud AI Platform PAI provides a full‑stack AI development service, and its QuickStart component bundles popular open‑source models with zero‑code or SDK workflows.

Runtime Requirements

The example runs only in the Ulanqab region on Lingjun clusters.

GPU: GU108 (80 GB) recommended; inference needs ≥4 GPUs, fine‑tuning needs ≥4 machines with 32 GPUs.

Refer to the official PAI Lingjun resource guide for provisioning.

Deploy Model via PAI Console

In the PAI console’s QuickStart entry, locate the Qwen‑72B‑Chat model card (see image). Use the “Model Deployment” page to select Lingjun resources and click Deploy, which creates a PAI‑EAS inference service.

Model card
Model card

After deployment, the service detail page shows the Endpoint and Token, which can be used to call the HTTP API. Example cURL commands for listing models, text generation, and chat are provided.

# Replace with your Endpoint and Token
export API_ENDPOINT="<ENDPOINT>"
export API_TOKEN="<TOKEN>"
# List models
curl $API_ENDPOINT/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $API_TOKEN"
# Text generation
curl $API_ENDPOINT/v1/completions -H "Content-Type: application/json" -H "Authorization: Bearer $API_TOKEN" -d '{
  "model": "qwen-72b-chat",
  "prompt": "San Francisco is a",
  "max_tokens": 256,
  "temperature": 0,
  "stop": ["<|im_end|>", "<|im_start|>"]
}'
# Chat
curl $API_ENDPOINT/v1/chat/completions -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "qwen-72b-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "介绍一下上海的历史"}
  ],
  "stop": ["<|im_end|>", "<|im_start|>"]
}'

Python SDK usage requires installing the OpenAI SDK and configuring the endpoint and token.

import openai
openai.api_key = "<TOKEN>"
openai.base_url = "<ENDPOINT>/v1"
completion = openai.chat.completions.create(
    model="qwen-72b-chat",
    temperature=0.0,
    top_p=0.8,
    messages=[{"role": "user", "content": "请介绍下你自己。"}],
    stop=["<|im_end|>", "<|im_start|>"]
)
print(completion.choices[0].message.content)

Model Fine‑Tuning

PAI‑QuickStart supports full‑parameter fine‑tuning of Qwen‑72B‑Chat using Megatron‑LM with techniques such as data parallelism, pipeline parallelism, and Zero‑offload. Users prepare training and validation JSON files (each entry contains “instruction” and “output”), upload them to an OSS bucket, and configure hyper‑parameters (learning_rate, sequence_length, etc.) in the console.

Fine‑tuning hyper‑parameters
Fine‑tuning hyper‑parameters

After submitting the job, the training status and logs are viewable in the console. Checkpoints are saved to the specified OSS bucket, and any checkpoint can be selected for inference.

Training job view
Training job view

Using PAI Python SDK

The SDK allows deploying the model with a few lines of code and retrieving the service Endpoint and Token.

from pai.session import get_default_session
from pai.model import RegisteredModel
session = get_default_session()

m = RegisteredModel(model_name="qwen-72b-chat", model_provider="pai")

predictor = m.deploy(
    service_name=f"qwen_72b_chat_{random_str(6)}",
    options={
        "metadata.quota_id": "<LingJunResourceQuotaId>",
        "metadata.quota_type": "Lingjun",
        "metadata.workspace_id": session.workspace_id,
    }
)

endpoint = predictor.internet_endpoint
token = predictor.access_token

Further SDK calls can list model inputs, set hyper‑parameters, and launch fine‑tuning jobs as shown in the earlier sections.

Conclusion

Alibaba Cloud PAI‑QuickStart provides an out‑of‑the‑box experience for deploying and fine‑tuning Qwen‑72B‑Chat, streamlining the AI development workflow and enabling developers and enterprises to accelerate innovation with large language models.

References

PAI QuickStart Overview: https://help.aliyun.com/zh/pai/user-guide/quick-start-overview

Qwen model series: https://modelscope.cn/organization/qwen

PAI Python SDK: https://github.com/aliyun/pai-python-sdk

PAI Lingjun Intelligent Computing Service: https://www.aliyun.com/product/bigdata/learn/pailingjun

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI PlatformPAI-QuickStartQwen-72B
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.