Fine‑Tune DeepSeek‑R1 with Just a Few Lines of Code Using Unsloth
This guide walks through setting up an Anaconda environment, installing Unsloth, downloading the DeepSeek‑R1‑Distill‑Llama‑8B model, preparing a medical CoT dataset, configuring LoRA parameters, running a short fine‑tuning job, and evaluating the customized model with structured prompts.
1. Environment Setup
Create a new Anaconda virtual environment named unsloth_gpu with Python 3.12, activate it, and install the required packages:
conda create -n unsloth_gpu python=3.12
conda activate unsloth_gpu
pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install datasetsVerify the installation by importing torch and FastLanguageModel and checking torch.cuda.is_available().
2. Quick Start with Unsloth
Download the DeepSeek‑R1‑Distill‑Llama‑8B checkpoint via ModelScope:
pip install modelscope
modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-8BLoad the model and tokenizer:
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="./DeepSeek-R1-Distill-Llama-8B",
max_seq_length=max_seq_length,
dtype=None,
load_in_4bit=False
)
print(model) # shows 32 Transformer decoder layers
print(tokenizer)Run a simple inference example (prove √2 is irrational) to confirm the model works.
3. Structured Prompt for Inference
Define a chat template and feed a question through the tokenizer:
prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
### Instruction:
你是一名助人为乐的助手。
### Question:
{}
### Response:
<think>{}"""
question = "请证明根号2是无理数"
inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs.input_ids, max_new_tokens=1200, use_cache=True)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])4. Dataset Preparation
Download the medical CoT dataset medical-o1-reasoning-SFT from ModelScope and load the first 500 records for a quick demo:
from datasets import load_dataset
dataset = load_dataset(path='json', data_files='./medical_o1_sft.json', split='train[0:500]')
print(dataset[0])Each record contains Question, Complex_CoT, and Response fields.
Format the records into a training prompt:
train_prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step‑by‑step chain of thoughts.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}
### Response:
<think>{}
</think>
{}
"""
def formatting_prompts_func(examples):
texts = []
for q, cot, resp in zip(examples["Question"], examples["Complex_CoT"], examples["Response"]):
text = train_prompt_style.format(q, cot, resp) + tokenizer.eos_token
texts.append(text)
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True)
print(dataset['text'][0])5. Fine‑Tuning with LoRA
Enable LoRA adapters on the model:
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None
)Configure the trainer (SFTTrainer) with a small batch size and 60 training steps for the demo:
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=10,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs"
),
)
trainer_stats = trainer.train()After training, switch the model back to inference mode:
FastLanguageModel.for_inference(model)The LoRA weights are now merged into the model.
6. Evaluation
Run the same medical question used earlier with the fine‑tuned model. The generated answer now includes the expected reasoning about normal post‑void residual volume and the absence of involuntary detrusor contractions, matching the standard answer more closely than the original model.
7. Model Merge and Export
Save the fine‑tuned model and tokenizer, then merge the LoRA parameters into a single checkpoint:
new_model_local = "DeepSeek-R1-Medical-COT-Tiny"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method="merged_16bit")8. Conclusion
The tutorial demonstrates that with Unsloth, a few lines of Python code are sufficient to fine‑tune a large language model (DeepSeek‑R1) on a domain‑specific CoT dataset, achieve noticeable performance gains, and export a ready‑to‑use model. Compared with llama‑factory, Unsloth offers clearer code structure, better training efficiency, and broader model support.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Fun with Large Models
Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
