Artificial Intelligence 12 min read

Step‑by‑Step Guide to Using LLaMAFactory for Full‑Cycle Large‑Model Training (Part 9)

This article walks through the complete workflow of fine‑tuning a Qwen2.5‑0.5B model with LLaMAFactory, covering environment setup, model download, dataset preparation, configuration editing, training execution, LoRA weight merging, and deployment via vLLM, while highlighting the framework’s minimal‑code and broad model support.

Fun with Large Models

Mar 20, 2026

Step‑by‑Step Guide to Using LLaMAFactory for Full‑Cycle Large‑Model Training (Part 9)

Why Large‑Model Training Frameworks Exist

Traditional fine‑tuning requires separate code for each model, leading to fragmentation and a high entry barrier . Frameworks like LLaMAFactory provide a unified, efficient platform that lets users customize and train hundreds of models with minimal effort.

Core Advantages of LLaMAFactory

Near‑zero code : Training is driven by simple configuration files and an optional WebUI.

Broad model and algorithm coverage : Supports over 100 mainstream models (e.g., LLaMA, Qwen, ChatGLM) and state‑of‑the‑art fine‑tuning methods such as LoRA, QLoRA, GaLore, and DoRA.

1. Environment Setup

Two practical approaches are provided:

Use the Lab4AI large‑model lab platform (pre‑installed lf0.9.4 image with H100 quota).

Manually install on a personal GPU server. Example commands:

git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install -e .
pip install -r requirements/metrics.txt -r requirements/deepspeed.txt

2. Model Download

For demonstration we use the ~900 MB Qwen2.5‑0.5B‑Instruct model. It is downloaded with ModelScope:

modelscope download --model Qwen/Qwen2.5-0.5B-Instruct --local_dir ./Qwen2_5_0_5

3. Dataset Preparation

LLaMAFactory ships example datasets. The alpaca_zh_demo.json file (Alpaca‑format QA) is placed under LlamaFactory/data and registered in data/dataset_info.json. Users can replace it with their own data in the same format.

4. Training Configuration

Create a project folder (e.g., test_sft) and copy examples/train_lora/qwen3_lora_sft.yaml to test_qwen_sft.yaml. Edit the following key fields: model_name_or_path:

/workspace/Qwen2_5_0_5

dataset

alpaca_zh_demo

template

: qwen (replace the original qwen3_nothink) output_dir:

/workspace/test_sft/Qwen2_5_sft

num_train_epochs

: 1 for a quick demo

5. Run Training

llamafactory-cli train /workspace/test_sft/test_qwen_sft.yaml

The command prints training logs and stores LoRA adapter weights and intermediate checkpoints in the specified output_dir.

6. Merge LoRA Weights

Because LoRA produces only a ~20 MB adapter, we merge it back into the base model for deployment. A merge configuration (derived from examples/merge_lora/qwen3_lora_sft.yaml) is copied to /workspace/test_sft/test_qwen_merge_sft.yaml and edited as follows (note: do not use quantized models when merging):

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
model_name_or_path: /workspace/Qwen2_5_0_5
adapter_name_or_path: /workspace/test_sft/Qwen2_5_sft
template: qwen
trust_remote_code: true
export_dir: /workspace/test_sft/Qwen2_5_sft_all
export_size: 5
export_device: cpu  # choices: [cpu, auto]
export_legacy_format: false

llamafactory-cli export /workspace/test_sft/test_qwen_merge_sft.yaml

The merged model is saved under /workspace/test_sft/Qwen2_5_sft_all with a size comparable to the original (~1 GB).

7. Deployment and Testing

Serve the merged model with vLLM:

vllm serve /workspace/test_sft/Qwen2_5_sft_all --served-model-name Qwen2_5 --max-model-len 8096 --gpu-memory-utilization 0.9 --port 6666

Invoke the model via a Python client:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:6666/v1", api_key="EMPTY")
response = client.chat.completions.create(
    model="Qwen2_5",
    messages=[{"role": "user", "content": "识别并解释给定列表中的两个科学理论：细胞理论和日心说。"}]
)
print(response.choices[0].message.content)

The response demonstrates successful end‑to‑end fine‑tuning, weight merging, and serving of the Qwen2.5‑0.5B model using LLaMAFactory.

Conclusion

The guide shows that LLaMAFactory dramatically simplifies the full lifecycle of large‑model fine‑tuning—from environment preparation to deployment—by offering a near‑zero‑code workflow and extensive model compatibility.

LoRA vLLM AI model training large model fine-tuning LLaMAFactory Qwen2.5-0.5B

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.