Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 19, 2026 · Artificial Intelligence

From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide

This guide shows how to fine‑tune Qwen3.5 models—from 0.8B to 122B—using Unsloth Studio or pure code, covering text SFT, vision fine‑tuning, MoE models, reinforcement‑learning (GRPO), extensive GGUF quantization benchmarks, hardware requirements, export formats, and deployment tips.

Fine-tuningLLMQwen3.5
0 likes · 12 min read
From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 14, 2026 · Artificial Intelligence

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

The DFlash approach replaces speculative decoding’s autoregressive drafter with a block diffusion model and injects target‑model hidden features into every KV‑cache layer, achieving up to 5× speed‑up for Qwen3.5‑27B on single‑GPU and 1.5–1.9× on high‑concurrency workloads while preserving output quality.

DFlashInference AccelerationQwen3.5
0 likes · 12 min read
Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 10, 2026 · Artificial Intelligence

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

The open‑source CoPaw‑Flash‑9B‑DataAnalyst‑LoRA model, fine‑tuned via LoRA, can autonomously load, explore, statistically analyze, visualize, and generate structured reports for CSV/Excel/JSON datasets, achieving a 90% success rate with an average of 26 iteration rounds, and it runs on a single consumer‑grade GPU using vLLM and the Data Analyst framework.

AgentData AnalystGPU
0 likes · 10 min read
How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval
0 likes · 12 min read
Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent
AI Engineering
AI Engineering
Apr 1, 2026 · Artificial Intelligence

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

H Company’s new Holo3 series delivers a visual language model that outperforms GPT‑5.4 on the OSWorld‑Verified benchmark with a 78.85% score while costing only about one‑tenth as much, offering both a flagship API‑only version and an open‑source lightweight variant optimized for GUI agents.

AI benchmarkGUI agentHolo3
0 likes · 4 min read
Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 31, 2026 · Artificial Intelligence

Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni

The author demonstrates a proof‑of‑concept that combines Qwen 3.5‑Omni's real‑time internet search and audio output with a locally hosted voice‑wake‑up model to transform a Bluetooth speaker into an always‑on smart assistant, while noting latency challenges and the potential of a sub‑10B open‑source alternative.

AI integrationBluetoothLarge Language Model
0 likes · 2 min read
Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 28, 2026 · Artificial Intelligence

Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal

Using the open‑source ToolCall‑15 benchmark, the author shows that the 27‑billion‑parameter Qwen3.5 model consistently scores full marks while the 397‑billion‑parameter version fails on several tasks, and that the Q6 quantized variant offers the best trade‑off between size and tool‑calling accuracy.

AILLM benchmarkQwen3.5
0 likes · 7 min read
Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 19, 2026 · Artificial Intelligence

Testing the Hot oMLX on Mac: Claude‑Opus‑4.6 Distilled and Qwen3.5‑9B Performance Review

The article evaluates oMLX, a Mac‑only LLM runtime built on Apple Silicon and MLX, by walking through installation, UI features, memory usage, single‑request speed, benchmark results for Claude‑Opus‑4.6 and Qwen3.5‑9B, continuous batch processing gains, Claude Code optimizations, multi‑model support, and the failure to run a 27B model.

Apple SiliconClaude OpusMLX
0 likes · 9 min read
Testing the Hot oMLX on Mac: Claude‑Opus‑4.6 Distilled and Qwen3.5‑9B Performance Review
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 18, 2026 · Artificial Intelligence

Running Claude‑Opus‑4.6‑Distilled Qwen3.5 27B on a Single RTX 4090 with llama.cpp: 46 tokens/s Performance

The article details a hands‑on test of the Claude‑Opus‑4.6‑distilled Qwen3.5 27B model running on a single RTX 4090 via llama.cpp, showing a steady 46 tokens per second generation speed, a 64K context window, and a step‑by‑step Docker‑based setup while comparing it to GLM‑4.7‑Flash‑AWQ‑4bit and discussing llama.cpp’s limitations for multi‑GPU inference.

Claude OpusDockerLLM inference
0 likes · 5 min read
Running Claude‑Opus‑4.6‑Distilled Qwen3.5 27B on a Single RTX 4090 with llama.cpp: 46 tokens/s Performance
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 16, 2026 · Artificial Intelligence

Testing Claude‑Opus‑4.6 Distilled Qwen3.5 9B Model Locally via LM Studio and Claude Code

The article evaluates the GGUF‑quantized Claude‑Opus‑4.6 distilled Qwen3.5 9B model on a 16 GB Mac Mini M4 using LM Studio, detailing model sizes, performance metrics, deployment steps, API integration with Claude Code, and concluding that while the 9B version is usable, its capabilities remain limited compared to larger models.

Claude OpusGGUFLM Studio
0 likes · 12 min read
Testing Claude‑Opus‑4.6 Distilled Qwen3.5 9B Model Locally via LM Studio and Claude Code
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 7, 2026 · Artificial Intelligence

vLLM 0.17.0 Release: Full Qwen 3.5 Support and Anthropic API Compatibility

The vLLM 0.17.0 release brings FlashAttention 4 integration, a mature Model Runner V2, complete Qwen 3.5 series support, a one‑click performance‑mode flag, Anthropic API compatibility, advanced weight‑offloading, broader hardware support beyond NVIDIA, ASR model integration, and detailed upgrade and installation guidance.

ASRAnthropic APIFlashAttention
0 likes · 12 min read
vLLM 0.17.0 Release: Full Qwen 3.5 Support and Anthropic API Compatibility
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 3, 2026 · Artificial Intelligence

How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally

This guide walks you through deploying Qwen3.5's 0.8B, 2B, 4B and 9B models on CPUs or modest GPUs using Unsloth's GGUF quantization, explains hardware requirements, shows how to run them with llama.cpp, llama‑server, vLLM or SGLang, and provides a free Colab fine‑tuning workflow with export options.

AI ModelsFine-tuningGGUF
0 likes · 19 min read
How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally
AI Engineering
AI Engineering
Mar 3, 2026 · Artificial Intelligence

Alibaba Qwen‑3.5 Small Models: 0.8B Parameters Enable Video on Edge Devices

Alibaba released four Qwen‑3.5 models (0.8B‑9B) that use a Gated DeltaNet hybrid‑attention architecture and native multimodal training to achieve 262k‑token contexts, outperform larger rivals on visual, reasoning, and math benchmarks, and run video analysis on phones and laptops, though they still demand significant VRAM.

Edge AIGated DeltaNetMultimodal AI
0 likes · 6 min read
Alibaba Qwen‑3.5 Small Models: 0.8B Parameters Enable Video on Edge Devices
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 2, 2026 · Artificial Intelligence

Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities

The article introduces the newly released Qwen3.5 small model series (0.8B, 2B, 4B, 9B), explains their shared Gated Delta Networks architecture, early multimodal token fusion, 201‑language support and up to 1 million‑token context, and presents benchmark data that show the 9B model rivaling much larger LLMs, followed by practical guidance on model selection and deployment.

Gated Delta NetworksLocal DeploymentQwen3.5
0 likes · 10 min read
Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 2, 2026 · Artificial Intelligence

Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice

The author reviews the Qwen3.5 model family, showing that the 27‑billion‑parameter dense Qwen3.5-27B offers the best balance of size, stability, low‑cost local deployment, and comprehensive capabilities, making it the default pick for most users.

AI benchmarkingLarge Language ModelLocal Deployment
0 likes · 6 min read
Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice