Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes

This guide walks you through using Meta's open‑source Llama 2 models on Alibaba Cloud's PAI platform, covering low‑code LoRA fine‑tuning, full‑parameter fine‑tuning with PAI‑DSW, and rapid WebUI deployment via PAI‑EAS, complete with step‑by‑step instructions, code snippets, and resource requirements.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes

Best Practice 1: Low‑code LoRA fine‑tuning and deployment

Meta released Llama 2 (7B, 13B, 70B) with free research and commercial use. Alibaba Cloud PAI quickly adapts these models, offering full‑parameter fine‑tuning, LoRA fine‑tuning, and inference services.

1. Enter PAI‑Quick‑Start

a. Log in to the PAI console: https://pai.console.aliyun.com/

b. Open the workspace and select “Quick‑Start”.

2. Choose the Llama 2 model

Select the “Generative AI – Large‑Language‑Model” category and pick llama-2-7b-chat-hf (or other sizes).

3. Deploy the model

Use the model detail page to create an online inference service. Ensure at least 64 GiB memory and 24 GiB GPU VRAM.

4. Call the inference service

After deployment, use the WebUI to send prediction requests or call the API via the “Use via API” link.

Best Practice 2: Full‑parameter fine‑tuning with PAI‑DSW

PAI‑DSW provides an interactive modeling environment for full‑parameter fine‑tuning of Llama‑2‑7B‑Chat.

1. Log in and download the model

import os
dsw_region = os.environ.get("dsw_region")
url_link = {
    "cn-shanghai": "https://atp-modelzoo-sh.oss-cn-shanghai-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-hangzhou": "https://atp-modelzoo.oss-cn-hangzhou-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-shenzhen": "https://atp-modelzoo-sz.oss-cn-shenzhen-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-beijing": "https://atp-modelzoo-bj.oss-cn-beijing-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz"
}
path = url_link[dsw_region]
os.environ['LINK_CHAT'] = path
!wget $LINK_CHAT
!tar -zxvf llama2-7b.tar.gz

2. Install the environment

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/ColossalAI.tar.gz
! tar -zxvf ColossalAI.tar.gz
! pip install ColossalAI/.
! pip install ColossalAI/applications/Chat/.
! pip install transformers==4.30.0
! pip install gradio==3.11

3. Prepare data

Use the default instruction‑tuning dataset attached to the model card or upload your own JSON with instruction, output, and id fields.

[
    {
        "instruction": "以下文本是否属于世界主题?为什么美国人很少举行阅兵?",
        "output": "是",
        "id": 0
    },
    {
        "instruction": "以下文本是否属于世界主题?重磅!事业单位车改时间表已出!",
        "output": "不是",
        "id": 1
    }
]

4. Submit the training job

Configure the dataset and use the default optimized hyper‑parameters. Monitor progress via the job detail page; successful jobs save the model to OSS.

5. Deploy the fine‑tuned model

Upload the trained model to OSS and follow the same deployment steps as in Best Practice 1.

# encoding=utf-8
import oss2, os
AK='yourAccessKeyId'
SK='yourAccessKeySecret'
endpoint='yourEndpoint'
dir='your model output dir'
auth = oss2.Auth(AK, SK)
bucket = oss2.Bucket(auth, endpoint, 'examplebucket')
for filename in os.listdir(dir):
    current_file_path = dir+filename
    file_path = '需要上传地址'
    bucket.put_object_from_file(file_path, current_file_path)

Best Practice 3: Quick WebUI deployment with PAI‑EAS

PAI‑EAS enables one‑click deployment of Llama 2 as an online service or AI‑Web application.

1. Open the PAI‑EAS model service page

Log in, navigate to Workspace → Model Deployment → Model Online Service (EAS).

2. Configure service parameters

Service name (e.g., chatllm_llama2_13b)

Deployment mode: Image‑based AI‑Web app

Image: chat-llm-webui (latest version)

Run command for 13B:

python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16

Run command for 7B:

python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf

GPU instance (e.g., ecs.gn6e-c12g1.3xlarge for 13B, A10/GU30 for 7B)

System disk: 50 GB

3. Deploy and launch WebUI

After deployment, click “View Web App”, open the WebUI, and test the model by entering prompts such as “请提供一个理财学习计划”.

These three best‑practice workflows demonstrate how to quickly fine‑tune, deploy, and serve Llama 2 models on Alibaba Cloud PAI, covering both low‑code LoRA and full‑parameter training scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIModel DeploymentFine-tuningAlibaba CloudLlama2PAI
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.