Artificial Intelligence 24 min read

Fine-Tuning Large Language Models with LoRA: A Step-by-Step Guide and Code Example

This article demonstrates the before-and-after effects of fine‑tuning a large language model, explains the concept with analogies, details hardware setup, dataset preparation, LoRA configuration, training arguments, and provides complete Python code for a pure‑framework fine‑tuning workflow.

Cognitive Technology Team

Feb 24, 2025

Fine-Tuning Large Language Models with LoRA: A Step-by-Step Guide and Code Example

The article begins by showing clear visual comparisons of a large model’s output before and after fine‑tuning, highlighting the change in tone and reduced reasoning time.

What is model fine‑tuning? It is likened to giving a "top student" extra tutoring to turn a generalist into a specialist, using a medical dataset as an example. Analogies such as a robot learning to draw cats with hats illustrate the process.

Practical cases are presented, including adapting a smart speaker to understand a dialect and adding a food‑filter to a camera, to convey how fine‑tuning modifies specific capabilities.

Hardware Configuration

GPU: NVIDIA GeForce RTX 4060

CPU: Intel Core i7‑13700H

Memory: 16 GB (available 8.8 GB / 15.7 GB)

Fine‑Tuning Workflow

(1) Dataset Preparation

The dataset comes from the Magic‑Deck community ( medical‑o1‑reasoning‑SFT) and includes a JSON format with fields Question, Complex_CoT, and Response. Complex chain‑of‑thought data is emphasized as essential for deep reasoning.

(2) Fine‑Tuning Code (pure‑framework implementation)

pip install torch transformers peft datasets matplotlib accelerate safetensors</code>
<code>import torch</code>
<code>import matplotlib.pyplot as plt</code>
<code>from transformers import (AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, TrainerCallback)</code>
<code>from peft import LoraConfig, get_peft_model</code>
<code>from datasets import load_dataset</code>
<code>import os</code>

<code># Paths (replace with actual locations)</code>
<code>model_path = r"your_model_path"</code>
<code>data_path = r"your_dataset_path"</code>
<code>output_path = r"your_output_path"</code>

<code># Ensure GPU is available</code>
<code>assert torch.cuda.is_available(), "GPU is required for training!"</code>
<code>device = torch.device("cuda")</code>

<code># Custom loss callback</code>
<code>class LossCallback(TrainerCallback):</code>
<code>    def __init__(self):</code>
<code>        self.losses = []</code>
<code>    def on_log(self, args, state, control, logs=None, **kwargs):</code>
<code>        if "loss" in logs:</code>
<code>            self.losses.append(logs["loss"])</code>

<code># Data processing</code>
<code>def process_data(tokenizer):</code>
<code>    dataset = load_dataset("json", data_files=data_path, split="train[:1500]")</code>
<code>    def format_example(example):</code>
<code>        instruction = f"诊断问题：{example['Question']}
详细分析：{example['Complex_CoT']}"</code>
<code>        inputs = tokenizer(f"{instruction}
### 答案：
{example['Response']}<|endoftext|>", padding="max_length", truncation=True, max_length=512, return_tensors="pt")</code>
<code>        return {"input_ids": inputs["input_ids"].squeeze(0), "attention_mask": inputs["attention_mask"].squeeze(0)}</code>
<code>    return dataset.map(format_example, remove_columns=dataset.column_names)</code>

<code># LoRA configuration</code>
<code>peft_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")</code>

<code># Training arguments</code>
<code>training_args = TrainingArguments(</code>
<code>    output_dir=output_path,</code>
<code>    per_device_train_batch_size=2,</code>
<code>    gradient_accumulation_steps=4,</code>
<code>    num_train_epochs=3,</code>
<code>    learning_rate=3e-4,</code>
<code>    fp16=True,</code>
<code>    logging_steps=20,</code>
<code>    save_strategy="no",</code>
<code>    report_to="none",</code>
<code>    optim="adamw_torch",</code>
<code>    no_cuda=False,</code>
<code>    dataloader_pin_memory=False,</code>
<code>    remove_unused_columns=False,</code>
<code>)</code>

<code># Main training function</code>
<code>def main():</code>
<code>    os.makedirs(output_path, exist_ok=True)</code>
<code>    tokenizer = AutoTokenizer.from_pretrained(model_path)</code>
<code>    tokenizer.pad_token = tokenizer.eos_token</code>
<code>    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map={"": device})</code>
<code>    model = get_peft_model(model, peft_config)</code>
<code>    model.print_trainable_parameters()</code>
<code>    dataset = process_data(tokenizer)</code>
<code>    loss_callback = LossCallback()</code>

<code>    def data_collator(data):</code>
<code>        batch = {</code>
<code>            "input_ids": torch.stack([torch.tensor(d["input_ids"]) for d in data]).to(device),</code>
<code>            "attention_mask": torch.stack([torch.tensor(d["attention_mask"]) for d in data]).to(device),</code>
<code>            "labels": torch.stack([torch.tensor(d["input_ids"]) for d in data]).to(device)</code>
<code>        }</code>
<code>        return batch</code>

<code>    trainer = Trainer(model=model, args=training_args, train_dataset=dataset, data_collator=data_collator, callbacks=[loss_callback])</code>
<code>    print("Starting training...")</code>
<code>    trainer.train()</code>
<code>    trainer.model.save_pretrained(output_path)</code>
<code>    print(f"Model saved to: {output_path}")</code>
<code>    plt.figure(figsize=(10, 6))</code>
<code>    plt.plot(loss_callback.losses)</code>
<code>    plt.title("Training Loss Curve")</code>
<code>    plt.xlabel("Steps")</code>
<code>    plt.ylabel("Loss")</code>
<code>    plt.savefig(os.path.join(output_path, "loss_curve.png"))</code>
<code>    print("Loss curve saved")</code>

<code>if __name__ == "__main__":</code>
<code>    main()

Explanation of Key Components

Libraries: PyTorch for tensor operations, Transformers for model/tokenizer handling, PEFT for parameter‑efficient fine‑tuning (LoRA), Datasets for loading JSON data, Matplotlib for visualizing loss.

Path and GPU checks: Users must replace placeholder paths and ensure CUDA is available.

LossCallback: Records loss values during training for later plotting.

process_data: Loads the first 1500 records, formats each example by concatenating the question, chain‑of‑thought analysis, and answer, then tokenizes with padding to a length of 512.

LoRA configuration: Low‑rank adaptation with rank 16, α = 32, targeting the query and value projection matrices, dropout 0.05, and no bias training, suitable for causal language modeling.

TrainingArguments: Small batch size (2) with gradient accumulation (4) to emulate batch 8, three epochs, learning rate 3e‑4, mixed‑precision (fp16), and no intermediate checkpoint saving.

data_collator: Packs tokenized inputs into tensors on the GPU and sets labels equal to the input IDs for causal LM training.

Trainer: Handles the training loop, logging, and callback integration.

Post‑training: Saves the fine‑tuned model and generates a loss‑curve PNG.

The article concludes with acknowledgments to DeepSeek for code assistance and notes that the current fine‑tuning setup is basic, leaving room for dataset refinement, hyper‑parameter tuning, and code structure improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LoRA PyTorch LLM fine-tuning HuggingFace

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.