7 min read

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Old Zhang's AI Learning

Apr 26, 2026

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

Model Release

Qwopus3.6-27B-v1-preview is a distilled version of Claude Opus fine‑tuned on the open‑source Qwen3.6‑27B model. The distillation uses supervised fine‑tuning (SFT) with the Unsloth training stack. The dataset consists of roughly 12 K high‑quality inference samples primarily from Kassadin88/Claude-Distillation-Dataset and supplemented with outputs from GLM‑5.1, Kimi‑K2.5 and Qwen3.5. The model is released under the Apache‑2.0 license and is labeled as a preview version.

Training Objectives

More structured reasoning processes.

Consistent answer style that does not drift in long texts.

Alignment of style across multiple source datasets.

Foundation for larger‑scale future versions.

Data Cleaning Process

The author filtered the raw distillation data with an 8‑B instruction model used as a style filter. Samples whose response style deviated from a unified tone were removed, leaving only 12 K “style‑consistent” entries. This reduction‑instead of expansion‑approach contrasts with common practices that favor larger datasets.

Early Evaluation

Collaborator Kyle Hessling evaluated the model on a single RTX 5090 (32 GB) using llama.cpp with the GGUF‑quantized model. Sixteen prompts covering three scenarios—agentic reasoning, production‑grade front‑end design (a strength of Qwen3.6), and creative Canvas/WebGL tasks—were run and compared against the original Qwen3.6‑27B baseline.

Result screenshots show that the distilled model matches or exceeds the baseline on the selected prompts. Full evaluation report: https://huggingface.co/spaces/Jackrong/qwopus36-eval

Installation & Usage

The release provides GGUF files that can be used with llama.cpp or any GGUF‑compatible inference framework such as Ollama, LM Studio, or KoboldCpp.

Quantization options available in the repository:

Q2_K – 10.7 GB, extreme memory saving with noticeable quality loss.

Q3_K_L – memory‑friendly for 24 GB GPUs.

IQ4_XS – 15.2 GB, good quality‑to‑size ratio.

Higher‑level quantizations (Q4, Q5, Q6, Q8) – total repository size 162 GB, suitable for 40 GB+ or dual‑GPU setups.

Example commands for IQ4_XS:

# Download the model file
huggingface-cli download Jackrong/Qwopus3.6-27B-v1-preview-GGUF \
    Qwopus3.6-27B-v1-preview-IQ4_XS.gguf --local-dir ./qwopus

# Start the server
./llama-server \
    -m ./qwopus/Qwopus3.6-27B-v1-preview-IQ4_XS.gguf \
    -c 32768 \
    --host 0.0.0.0 --port 8080

Memory guidelines (based on the dense 27 B model):

IQ4_XS runs on a single 24 GB GPU (e.g., 4090, 5090, 3090) with moderate context length.

Q2_K fits into 16 GB GPUs, though quality loss is significant for the full 27 B model.

Higher‑quality quantizations (Q6, Q8) require 40 GB+ memory or dual‑GPU configurations.

Ollama users can create a local Modelfile from the GGUF file using ollama create.

Warning: Qwen3.6‑27B includes a vision encoder, but the current GGUF repository contains only the pure language weights; visual support in llama.cpp must be verified independently.

SFT model distillation llama.cpp Apache 2.0 Claude Opus GGUF Unsloth Qwen3.6-27B

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.