Artificial Intelligence 15 min read

How to Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI: Step‑by‑Step Guide

This guide walks you through using Alibaba Cloud's PAI platform to quickly fine‑tune Llama 2 with LoRA or full‑parameter methods, deploy the models as online inference services, and launch an interactive WebUI, covering preparation, data formatting, training jobs, and deployment details.

Alibaba Cloud Developer

Jul 26, 2023

How to Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI: Step‑by‑Step Guide

Introduction

Meta released the open‑source large language model Llama 2 (7B, 13B, 70B) with chat‑optimized variants. The models are free for research and commercial use, but enterprises with monthly active users over 700 million must apply for a license.

Best Practice 1: Low‑code LoRA fine‑tuning and deployment

Preparation

1. Log in to the PAI console and open the PAI‑Quick Start module.

2. Choose the large‑language‑model category and select llama‑2‑7b‑chat‑hf (or the equivalent 7B model).

Tips: Larger models generally perform better but require more compute and data.

Model inference

Deploy the selected model to PAI‑EAS. The service needs at least 64 GiB memory and 24 GiB GPU memory.

After deployment, you can use the WebUI to send prediction requests, or click “Use via API” for programmatic access.

Best Practice 2: Full‑parameter fine‑tuning

Environment

Python ≥ 3.9; GPU A100 (80 GiB) is recommended.

Data preparation

Training data must be in JSON format, each entry containing instruction, output, and id fields. Example:

[
    {"instruction": "以下文本是否属于世界主题？...", "output": "是", "id": 0},
    {"instruction": "以下文本是否属于世界主题？...", "output": "不是", "id": 1}
]

Upload the dataset to OSS; a validation set is also recommended for evaluating training progress.

Training job submission

The Quick Start page provides default hyper‑parameters and resource configurations. You can modify them as needed and monitor the job status on the training job detail page.

Fine‑tuned model deployment

After training succeeds, upload the model files to OSS and deploy them as an online inference service using the same steps described in Best Practice 1.

Best Practice 3: Quick WebUI deployment

Service deployment

Deploy Llama‑2‑13B‑chat (or 7B) using the PAI‑EAS module with the chat‑llm‑webui image.

Service name (example): chatllm_llama2_13b

Deployment mode: Image deployment AI‑Web application

Image: chat‑llm‑webui , version 1.0 (or the latest)

Run command for 13B:

python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16

Run command for 7B:

python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf

After deployment, open the WebUI, enter a prompt (e.g., “请提供一个理财学习计划”), and the model will generate a response.

What’s More

The article focuses on 7B and 13B models; future tutorials will cover 70B fine‑tuning and deployment.

A free trial of PAI‑EAS is available; click “Read original” to claim it.

References

Llama 2: Inside the Model – https://ai.meta.com/llama/#inside-the-model

Llama 2 Community License Agreement – https://ai.meta.com/resources/models-and-libraries/llama-downloads/

HuggingFace Open LLM Leaderboard – https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Alibaba Cloud Machine Learning Platform PAI – https://www.aliyun.com/product/bigdata/learn

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fine-tuning LoRA AI Deployment Alibaba Cloud Llama2 PAI

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.