Artificial Intelligence 7 min read

Zero‑Code Fine‑Tuning Hundreds of Large Models with the LLaMA‑Factory MLU Image

This article provides a step‑by‑step guide to deploying the LLaMA‑Factory MLU image on Cambricon MLU hardware, covering environment checks, downloading the modified source package, configuring Python dependencies, and running both the Web UI and command‑line fine‑tuning for models such as Qwen2.5‑0.5B.

SuanNi

Apr 28, 2026

Zero‑Code Fine‑Tuning Hundreds of Large Models with the LLaMA‑Factory MLU Image

LLaMA‑Factory is an open‑source, low‑code fine‑tuning framework for large language models (LLM) and visual language models (VLM) that integrates the entire workflow from data preparation to deployment.

The Cambricon GPU platform now offers a pre‑built MLU‑adapted image llamafactory-mlu, enabling users to start fine‑tuning large models without writing code.

1. Environment Pre‑check

Hardware: Cambricon MLU370 series accelerator cards.

Driver: run cnmon to verify the card is visible.

Python version: 3.10 (required by the driver).

Underlying framework: official Cambricon PyTorch (torch_mlu). Verify with python -c "import torch_mlu" and ensure no error.

2. Download and Extract the Modified Package

cd /mnt/workspace  # switch to persistent storage
# 1. Download the tarball (using ghfast for acceleration)
wget https://ghfast.top/https://github.com/fzfz666/llamafactory-mlu/raw/main/LLaMA-Factory_mlu_Source_Only.tar.gz
# 2. Extract
tar -xzvf LLaMA-Factory_mlu_Source_Only.tar.gz
# 3. Enter the directory (you should see four folders ending with _mlu)
cd Cambricon_LLM_Env

3. One‑Click Python Dependency Configuration

This step installs base dependencies and forces the MLU‑adapted source code to be used.

# Configure Alibaba mirror for faster downloads
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Pin Gradio to <5.0 to avoid UI garbling
pip install "gradio<5.0.0"
# Install generic LLaMA‑Factory runtime dependencies (datasets, trl, rouge‑chinese, etc.)
cd LLaMA-Factory_mlu
pip install -e .[metrics]
cd ..
# Core step: redirect imports to MLU‑adapted packages
cd transformers_mlu && pip install -e . && cd ..
cd peft_mlu && pip install -e . && cd ..
cd accelerate_mlu && pip install -e . && cd ..

4. Run Fine‑Tuning (example with Qwen2.5‑0.5B)

4.1 Start the Web UI

cd LLaMA-Factory_mlu
GRADIO_SERVER_PORT=80 llamafactory-cli webui

Access the UI via the server IP in a browser.

4.2 Required Settings in the UI

Precision: fp16 FlashAttention: disabled

Model path: path to your Qwen2.5 checkpoint

4.3 Command‑Line Quick Validation (recommended)

cd LLaMA-Factory_mlu
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train True \
    --model_name_or_path /your/path/Qwen2.5-0.5B-Instruct \
    --finetuning_type lora \
    --template qwen \
    --dataset_dir data \
    --dataset alpaca_zh_demo \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 5 \
    --save_steps 100 \
    --output_dir saves/qwen2.5_mlu_test \
    --fp16 True \
    --plot_loss True \
    --flash_attn disabled

The guide concludes by encouraging users to try the setup.

CLI Python LLM Fine-tuning LLaMA-Factory Cambricon MLU