Artificial Intelligence 18 min read

Fine‑Tune DeepSeek‑R1 with Just a Few Lines of Code Using Unsloth

This guide walks through setting up an Anaconda environment, installing Unsloth, downloading the DeepSeek‑R1‑Distill‑Llama‑8B model, preparing a medical CoT dataset, configuring LoRA parameters, running a short fine‑tuning job, and evaluating the customized model with structured prompts.

Fun with Large Models

Mar 20, 2025

Fine‑Tune DeepSeek‑R1 with Just a Few Lines of Code Using Unsloth

1. Environment Setup

Create a new Anaconda virtual environment named unsloth_gpu with Python 3.12, activate it, and install the required packages:

conda create -n unsloth_gpu python=3.12
conda activate unsloth_gpu
pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install datasets

Verify the installation by importing torch and FastLanguageModel and checking torch.cuda.is_available().

2. Quick Start with Unsloth

Download the DeepSeek‑R1‑Distill‑Llama‑8B checkpoint via ModelScope:

pip install modelscope
modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Load the model and tokenizer:

max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="./DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=False
)
print(model)   # shows 32 Transformer decoder layers
print(tokenizer)

Run a simple inference example (prove √2 is irrational) to confirm the model works.

3. Structured Prompt for Inference

Define a chat template and feed a question through the tokenizer:

prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。

### Instruction:
你是一名助人为乐的助手。

### Question:
{}

### Response:
<think>{}"""
question = "请证明根号2是无理数"
inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs.input_ids, max_new_tokens=1200, use_cache=True)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

4. Dataset Preparation

Download the medical CoT dataset medical-o1-reasoning-SFT from ModelScope and load the first 500 records for a quick demo:

from datasets import load_dataset
dataset = load_dataset(path='json', data_files='./medical_o1_sft.json', split='train[0:500]')
print(dataset[0])

Each record contains Question, Complex_CoT, and Response fields.

Format the records into a training prompt:

train_prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step‑by‑step chain of thoughts.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}
</think>
{}
"""

def formatting_prompts_func(examples):
    texts = []
    for q, cot, resp in zip(examples["Question"], examples["Complex_CoT"], examples["Response"]):
        text = train_prompt_style.format(q, cot, resp) + tokenizer.eos_token
        texts.append(text)
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)
print(dataset['text'][0])

5. Fine‑Tuning with LoRA

Enable LoRA adapters on the model:

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None
)

Configure the trainer (SFTTrainer) with a small batch size and 60 training steps for the demo:

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs"
    ),
)
trainer_stats = trainer.train()

After training, switch the model back to inference mode:

FastLanguageModel.for_inference(model)

The LoRA weights are now merged into the model.

6. Evaluation

Run the same medical question used earlier with the fine‑tuned model. The generated answer now includes the expected reasoning about normal post‑void residual volume and the absence of involuntary detrusor contractions, matching the standard answer more closely than the original model.

7. Model Merge and Export

Save the fine‑tuned model and tokenizer, then merge the LoRA parameters into a single checkpoint:

new_model_local = "DeepSeek-R1-Medical-COT-Tiny"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method="merged_16bit")

8. Conclusion

The tutorial demonstrates that with Unsloth, a few lines of Python code are sufficient to fine‑tune a large language model (DeepSeek‑R1) on a domain‑specific CoT dataset, achieve noticeable performance gains, and export a ready‑to‑use model. Compared with llama‑factory, Unsloth offers clearer code structure, better training efficiency, and broader model support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python fine-tuning LoRA DeepSeek Unsloth

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.