Beginner’s Guide to Large Model Fine‑Tuning with Unsloth: Tips and Parameter Ranges

This article walks beginners through the entire fine‑tuning workflow for large language models using Unsloth, covering model and method selection, key hyper‑parameters, dataset formats, training scripts, evaluation strategies, and model‑saving options with concrete code examples.

Fun with Large Models
Fun with Large Models
Fun with Large Models
Beginner’s Guide to Large Model Fine‑Tuning with Unsloth: Tips and Parameter Ranges

Why Fine‑Tuning Matters

Large language models are now ubiquitous, but generic models often behave like “artificial idiots” in specialized domains. Fine‑tuning adapts a pre‑trained model to domain‑specific data, enabling knowledge updates, custom behavior, and task‑specific accuracy improvements.

Popular Fine‑Tuning Tools

Among the many tools, LlamaFactory and Unsloth are the most widely used. The author previously published tutorials for both.

Step 1 – Choose Model and Method

Even with ample GPU resources, start with an instruct model smaller than 14 B parameters (e.g., Qwen2.5‑7B‑Instruct) before scaling up to larger models such as Qwen2.5‑72B‑Instruct. The two main fine‑tuning methods are:

LoRA : adds a low‑rank trainable matrix (16‑bit) without updating the full model weights.

QLoRA : combines LoRA with 4‑bit quantisation, reducing memory usage; Unsloth defaults to QLoRA.

Step 2 – Model Settings

Three core parameters must be set: max_seq_length – default 2048; larger values (4096, 8192) can be tried later. dtype – defaults to None, which selects torch.float16 or, on A100/H100 GPUs, torch.bfloat16 for a wider numeric range. load_in_4bit – default True for 4‑bit quantisation (≈1 % accuracy loss, much lower resource use). Set full_finetuning=True for full‑parameter training or load_in_8bit=True for 8‑bit quantisation.

Example code for loading a 4‑bit model:

from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="./QwQ-32B-unsloth-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

Step 3 – Prepare the Dataset

For continued pre‑training (CPT) only a text field is required:

[
  {"text": "Pasta carbonara is a traditional Roman pasta dish..."}
]

Supervised fine‑tuning (SFT) uses the Alpaca‑style instruction format:

{
  "instruction": "必选,任务描述",
  "input": "可选,额外输入",
  "output": "模型应生成的答案"
}

Multi‑turn dialogue requires ShareGPT or ChatML formats; examples are provided in the source.

Step 4 – Training Hyper‑Parameters

Unsloth’s recommended LoRA settings (adjustable ranges): r = 16 – rank of the LoRA matrix; try 8, 16, 32, 64, 128.

target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]

– default all modules. lora_alpha = 16 – usually set equal to or a multiple of r. lora_dropout = 0 – increase up to 0.5 only if over‑fitting. bias = "none" – avoids over‑fitting. use_gradient_checkpointing = "unsloth" – saves ~30 % memory. random_state = 3407 – ensures reproducibility. use_rslora = False – advanced feature, keep false. loftq_config = None – advanced initialization, usually disabled.

Overall training arguments (example values):

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=3,
        warmup_steps=5,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Key hyper‑parameters and suggested ranges:

learning_rate : 1e‑4 – 5e‑5 (higher speeds up training but may cause over‑fitting).

num_train_epochs : Unsloth suggests 1‑3; the author experiments with 1‑50, noting >50 is rarely optimal.

per_device_train_batch_size : default 2; increase if GPU memory permits.

gradient_accumulation_steps : default 4; raise to simulate larger batches.

Over‑fitting vs. under‑fitting guidance:

To avoid over‑fitting: lower learning_rate, increase batch size, reduce epochs, switch dataset to ShareGPT format, raise dropout.

To avoid under‑fitting: raise learning_rate, increase epochs, increase r and lora_alpha (keep lora_alpha ≥ r).

Step 5 – Train and Evaluate

During training Unsloth prints loss values; aim for loss ≤ 0.5. If loss stays above 1, adjust parameters. A loss of 0 indicates severe over‑fitting – reduce epochs or learning rate.

Evaluation typically reserves 20 % of data as a test set. Unsloth can evaluate every 100 steps via evaluation_steps = 100, or external tools like EleutherAI’s lm‑evaluation‑harness can be used.

FastLanguageModel.for_inference(model)  # enable inference mode
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

Step 6 – Save the Model

Three saving options:

Save only LoRA adapters (few hundred MB):

model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")

Save the full merged model in safetensors format (compatible with HuggingFace, ModelScope, vLLM):

model.save_pretrained_merged("model", tokenizer, save_method="merged_4bit")  # 4‑bit
model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit") # 16‑bit

Save in GGUF format for Ollama deployment:

model.save_pretrained_gguf("dir", tokenizer, quantization_method="q4_k_m")  # q4
model.save_pretrained_gguf("dir", tokenizer, quantization_method="q8_0")   # q8
model.save_pretrained_gguf("dir", tokenizer, quantization_method="f16")    # float16

Conclusion

The author combines Unsloth’s official documentation with personal experience to present practical parameter ranges and step‑by‑step instructions, enabling beginners to quickly fine‑tune large models for specialized domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LoRAQLoRAParameter TuningUnslothFine‑tuning
Fun with Large Models
Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.