Tagged articles
5 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 17, 2026 · Artificial Intelligence

Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization

This article explains how to run the 397B Qwen3.5 model on a Mac by using Unsloth Dynamic 2.0 quantization (2‑bit, 3‑bit, or 4‑bit), outlines hardware requirements, provides compilation and download commands for llama.cpp, shows how to launch inference in thinking and non‑thinking modes, and compares several deployment options such as llama‑server, Transformers, SGLang/vLLM, and MLX.

Dynamic QuantizationGGUFLLM deployment
0 likes · 14 min read
Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization
Data Thinking Notes
Data Thinking Notes
Feb 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the 671‑billion‑parameter DeepSeek R1 model using Ollama, covering model selection, hardware requirements, dynamic quantization, detailed installation steps, performance observations, and practical recommendations for consumer‑grade hardware.

AI model optimizationDeepSeekDynamic Quantization
0 likes · 14 min read
How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide
Top Architect
Top Architect
Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

DeepSeekDynamic QuantizationOllama
0 likes · 16 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
DaTaobao Tech
DaTaobao Tech
Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU optimizationDynamic QuantizationINT8
0 likes · 19 min read
Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend