Artificial Intelligence 13 min read

How to Fine‑Tune and Deploy Mixtral 8x7B MOE Model on Alibaba Cloud PAI

This guide walks AI developers through downloading the Mixtral 8x7B MOE large language model, fine‑tuning it with Swift or Deepspeed on Alibaba Cloud PAI‑DSW, testing inference with Transformers, and finally deploying the tuned model as an online service using PAI‑EAS.

Alibaba Cloud Big Data AI Platform

Jan 12, 2024

How to Fine‑Tune and Deploy Mixtral 8x7B MOE Model on Alibaba Cloud PAI

Overview of Mixtral 8x7B MOE Model

Mixtral 8x7B is a decoder‑only sparse‑expert (Mixture‑of‑Experts) LLM released by Mixtral AI, containing 46.7 B parameters. For each token, the router selects two of eight expert groups, achieving performance comparable to Llama‑2 70B and GPT‑3.5 while keeping inference costs lower.

Alibaba Cloud PAI Platform

PAI (Platform for AI) offers a full‑stack AI development environment, covering data annotation, model building, training, deployment, and inference optimization.

1. Lightweight Fine‑Tuning with PAI‑DSW (Swift)

Download the model, set up dependencies, and run a Swift LoRA fine‑tuning job.

!apt-get update
!echo y | apt-get install aria2
def aria2(url, filename, d):
    !aria2c --console-log-level=error -c -x 16 -s 16 {url} -o {filename} -d {d}
mixtral_url = "http://pai-vision-data-inner-wulanchabu.oss-cn-wulanchabu-internal.aliyuncs.com/mixtral/Mixtral-8x7B-Instruct-v0.1.tar"
aria2(mixtral_url, mixtral_url.split("/")[-1], "/root/")
!cd /root && mkdir -p AI-ModelScope
!cd /root && tar -xf Mixtral-8x7B-Instruct-v0.1.tar -C /root/AI-ModelScope
import os
os.environ['MODELSCOPE_CACHE'] = '/root'
!cd swift/examples/pytorch/llm && PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1 \
python llm_sft.py \
    --model_id_or_path AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 \
    --model_revision master \
    --sft_type lora \
    --tuner_backend swift \
    --dtype AUTO \
    --output_dir /root/output \
    --ddp_backend nccl \
    --dataset alpaca-zh \
    --train_dataset_sample 100 \
    --num_train_epochs 2 \
    --max_length 2048 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --batch_size 1 \
    --weight_decay 0.01 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 16 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 300 \
    --save_steps 300 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --only_save_model true \
    --gradient_checkpointing false

Merge the LoRA weights into the original checkpoint:

!swift merge-lora --ckpt_dir '/root/output/mistral-7b-moe-instruct/v3-20231215-111107/checkpoint-12'

Run an offline inference test with the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/root/output/mistral-7b-moe-instruct/v3-20231215-111107/checkpoint-12-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')
text = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. ...
[/INST]"""
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2. Lightweight Fine‑Tuning with Deepspeed

Deepspeed can also be used for LoRA fine‑tuning on two 80 GB GPUs.

!apt-get update
!echo y | apt-get install aria2
def aria2(url, filename, d):
    !aria2c --console-log-level=error -c -x 16 -s 16 {url} -o {filename} -d {d}
mixtral_url = "http://pai-vision-data-inner-wulanchabu.oss-cn-wulanchabu-internal.aliyuncs.com/mixtral/Mixtral-8x7B-Instruct-v0.1.tar"
aria2(mixtral_url, mixtral_url.split("/")[-1], "/root/")
!cd /root && tar -xf Mixtral-8x7B-Instruct-v0.1.tar
!wget -c https://pai-quickstart-predeploy-hangzhou.oss-cn-hangzhou.aliyuncs.com/huggingface/datasets/llm_instruct/en_poetry_train_mixtral.json
!wget -c https://pai-quickstart-predeploy-hangzhou.oss-cn-hangzhou.aliyuncs.com/huggingface/datasets/llm_instruct/en_poetry_test_mixtral.json
!mkdir -p /root/output
!deepspeed /ml/code/train_sft.py \
    --model_name_or_path /root/Mixtral-8x7B-Instruct-v0.1/ \
    --train_path en_poetry_train_mixtral.json \
    --valid_path en_poetry_test_mixtral.json \
    --learning_rate 1e-5 \
    --lora_dim 32 \
    --max_seq_len 256 \
    --model mixtral \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --zero_stage 3 \
    --gradient_checkpointing \
    --print_loss \
    --deepspeed \
    --output_dir /root/output/ \
    --offload

After training, copy the necessary tokenizer and generation config files to the output directory and test inference similarly to the Swift example.

3. Deploying the Fine‑Tuned Model with PAI‑EAS

Install the PAI Python SDK, upload the checkpoint to OSS, and create an inference specification based on the provided Mixtral‑8x7B image.

!python -m pip install alipai --upgrade
import pai
from pai.session import get_default_session
from pai.common.oss_utils import upload
sess = get_default_session()
model_uri = upload(source_path="/root/output", oss_path="mixtral-7b-moe-instruct-ds")
from pai.model import RegisteredModel
inference_spec = RegisteredModel("Mixtral-8x7B-Instruct-v0.1", model_provider="pai").inference_spec
infer_spec.mount(model_uri, model_path="/ml/model")
from pai.model import Model
from pai.predictor import Predictor
m = Model(inference_spec=infer_spec)
predictor = m.deploy(service_name='mixtral_sdk_example_ds', options={"metadata.quota_id":"<ResourceGroupQuotaId>","metadata.quota_type":"Lingjun","metadata.workspace_id":session.workspace_id})
endpoint = predictor.internet_endpoint
token = predictor.access_token

Call the deployed service using OpenAI‑compatible REST APIs (curl examples shown in the original article).

Additional Resources

Links to Alibaba Cloud PAI product pages, PAI‑DSW, PAI‑EAS, QuickStart guide, and the PAI Python SDK are provided for further exploration.

Fine-tuning Swift DeepSpeed Alibaba Cloud PAI Mixtral

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.