Beginner’s Guide to Large Model Fine‑Tuning with Unsloth: Tips and Parameter Ranges
This article walks beginners through the entire fine‑tuning workflow for large language models using Unsloth, covering model and method selection, key hyper‑parameters, dataset formats, training scripts, evaluation strategies, and model‑saving options with concrete code examples.
Why Fine‑Tuning Matters
Large language models are now ubiquitous, but generic models often behave like “artificial idiots” in specialized domains. Fine‑tuning adapts a pre‑trained model to domain‑specific data, enabling knowledge updates, custom behavior, and task‑specific accuracy improvements.
Popular Fine‑Tuning Tools
Among the many tools, LlamaFactory and Unsloth are the most widely used. The author previously published tutorials for both.
Step 1 – Choose Model and Method
Even with ample GPU resources, start with an instruct model smaller than 14 B parameters (e.g., Qwen2.5‑7B‑Instruct) before scaling up to larger models such as Qwen2.5‑72B‑Instruct. The two main fine‑tuning methods are:
LoRA : adds a low‑rank trainable matrix (16‑bit) without updating the full model weights.
QLoRA : combines LoRA with 4‑bit quantisation, reducing memory usage; Unsloth defaults to QLoRA.
Step 2 – Model Settings
Three core parameters must be set: max_seq_length – default 2048; larger values (4096, 8192) can be tried later. dtype – defaults to None, which selects torch.float16 or, on A100/H100 GPUs, torch.bfloat16 for a wider numeric range. load_in_4bit – default True for 4‑bit quantisation (≈1 % accuracy loss, much lower resource use). Set full_finetuning=True for full‑parameter training or load_in_8bit=True for 8‑bit quantisation.
Example code for loading a 4‑bit model:
from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="./QwQ-32B-unsloth-bnb-4bit",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)Step 3 – Prepare the Dataset
For continued pre‑training (CPT) only a text field is required:
[
{"text": "Pasta carbonara is a traditional Roman pasta dish..."}
]Supervised fine‑tuning (SFT) uses the Alpaca‑style instruction format:
{
"instruction": "必选,任务描述",
"input": "可选,额外输入",
"output": "模型应生成的答案"
}Multi‑turn dialogue requires ShareGPT or ChatML formats; examples are provided in the source.
Step 4 – Training Hyper‑Parameters
Unsloth’s recommended LoRA settings (adjustable ranges): r = 16 – rank of the LoRA matrix; try 8, 16, 32, 64, 128.
target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]– default all modules. lora_alpha = 16 – usually set equal to or a multiple of r. lora_dropout = 0 – increase up to 0.5 only if over‑fitting. bias = "none" – avoids over‑fitting. use_gradient_checkpointing = "unsloth" – saves ~30 % memory. random_state = 3407 – ensures reproducibility. use_rslora = False – advanced feature, keep false. loftq_config = None – advanced initialization, usually disabled.
Overall training arguments (example values):
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
num_train_epochs=3,
warmup_steps=5,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=10,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
),
)Key hyper‑parameters and suggested ranges:
learning_rate : 1e‑4 – 5e‑5 (higher speeds up training but may cause over‑fitting).
num_train_epochs : Unsloth suggests 1‑3; the author experiments with 1‑50, noting >50 is rarely optimal.
per_device_train_batch_size : default 2; increase if GPU memory permits.
gradient_accumulation_steps : default 4; raise to simulate larger batches.
Over‑fitting vs. under‑fitting guidance:
To avoid over‑fitting: lower learning_rate, increase batch size, reduce epochs, switch dataset to ShareGPT format, raise dropout.
To avoid under‑fitting: raise learning_rate, increase epochs, increase r and lora_alpha (keep lora_alpha ≥ r).
Step 5 – Train and Evaluate
During training Unsloth prints loss values; aim for loss ≤ 0.5. If loss stays above 1, adjust parameters. A loss of 0 indicates severe over‑fitting – reduce epochs or learning rate.
Evaluation typically reserves 20 % of data as a test set. Unsloth can evaluate every 100 steps via evaluation_steps = 100, or external tools like EleutherAI’s lm‑evaluation‑harness can be used.
FastLanguageModel.for_inference(model) # enable inference mode
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1200,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])Step 6 – Save the Model
Three saving options:
Save only LoRA adapters (few hundred MB):
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")Save the full merged model in safetensors format (compatible with HuggingFace, ModelScope, vLLM):
model.save_pretrained_merged("model", tokenizer, save_method="merged_4bit") # 4‑bit
model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit") # 16‑bitSave in GGUF format for Ollama deployment:
model.save_pretrained_gguf("dir", tokenizer, quantization_method="q4_k_m") # q4
model.save_pretrained_gguf("dir", tokenizer, quantization_method="q8_0") # q8
model.save_pretrained_gguf("dir", tokenizer, quantization_method="f16") # float16Conclusion
The author combines Unsloth’s official documentation with personal experience to present practical parameter ranges and step‑by‑step instructions, enabling beginners to quickly fine‑tune large models for specialized domains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Fun with Large Models
Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
