Tag

dynamic quantization

1 views collected around this technical thread.

Data Thinking Notes
Data Thinking Notes
Feb 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the 671‑billion‑parameter DeepSeek R1 model using Ollama, covering model selection, hardware requirements, dynamic quantization, detailed installation steps, performance observations, and practical recommendations for consumer‑grade hardware.

AI model optimizationDeepSeekGPU inference
0 likes · 14 min read
How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide
Top Architect
Top Architect
Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekLocal Deployment
0 likes · 16 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
Architecture Digest
Architecture Digest
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to deploy the full 671B DeepSeek R1 model on local hardware using Ollama, leveraging dynamic quantization to shrink model size, detailing hardware requirements, step‑by‑step installation, configuration, performance observations, and practical recommendations.

DeepSeekGPULLM deployment
0 likes · 12 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
DaTaobao Tech
DaTaobao Tech
Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU OptimizationInt8LLM inference
0 likes · 19 min read
Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend