Tagged articles

local deployment

54 articles · Page 1 of 1

Jul 5, 2026 · Artificial Intelligence

How to Get 1.6 B Free Tokens Monthly with OmniRoute: A Local AI Gateway that Replaces GPT/Claude APIs

OmniRoute v3.8.43 is an MIT‑licensed, locally deployed AI gateway that aggregates 231 model providers and over 50 free channels, delivering roughly 1.6 billion free tokens per month, applying up to 95% token compression, auto‑fallback routing, and multi‑IDE support, while offering detailed deployment guides and risk warnings.

AI gatewayIDE integrationOmniRoute

0 likes · 16 min read

How to Get 1.6 B Free Tokens Monthly with OmniRoute: A Local AI Gateway that Replaces GPT/Claude APIs

Old Zhang's AI Learning

Jun 15, 2026 · Artificial Intelligence

Reproducing Claude Fable 5 with Opus 4.8 and a Prompt: 90% Performance on Consumer GPUs

The article analyzes Claude Fable 5’s capabilities, dissects Anthropic’s official prompt guide, compares leaked system prompts, and demonstrates how to achieve roughly 90% of Fable 5’s performance on a consumer‑grade GPU using Opus 4.8 plus a custom prompt, while also presenting a local Gemma 4 12B coder alternative.

Claude Fable 5Gemma-4-12BOpus 4.8

0 likes · 14 min read

Reproducing Claude Fable 5 with Opus 4.8 and a Prompt: 90% Performance on Consumer GPUs

Lao Guo's Learning Space

Jun 10, 2026 · Artificial Intelligence

2026 Top 10 Local LLMs Ranked by Real Downloads, GPU Fit, and License Risks

The article analyzes why local large‑language‑model deployment is essential for privacy, offline use, and cost control, then ranks the ten most popular models in 2026 using Ollama download counts, GitHub stars, benchmark scores, and hardware requirements, and finally provides a GPU‑based selection guide, deployment‑tool comparison, license‑risk table, decision‑tree and quick‑start instructions.

GPULLMOpen Source

0 likes · 19 min read

2026 Top 10 Local LLMs Ranked by Real Downloads, GPU Fit, and License Risks

AI Architecture Path

Jun 4, 2026 · Artificial Intelligence

Odysseus: Free Private AI Workstation That Earned 39K+ Stars in 3 Days

Facing costly AI subscriptions, fragmented workflows, and privacy worries, the open‑source Odysseus offers a self‑hosted AI suite with agents, auto‑modeling, deep research, blind model testing, and an integrated office package, plus detailed multi‑platform deployment guides and a candid risk assessment.

AI agentsDockerOdysseus

0 likes · 10 min read

Odysseus: Free Private AI Workstation That Earned 39K+ Stars in 3 Days

Old Zhang's AI Learning

Jun 1, 2026 · Artificial Intelligence

Opus‑Distilled Qwen3.5‑Coder Scores 100/100 Tool Calls, 1.4‑2.2× Faster with MTP, 128K Context on Consumer GPU

The article introduces Qwopus3.5‑4B‑Coder‑MTP‑GGUF, a 4‑billion‑parameter agent model fine‑tuned for code debugging, tool calling, and structured reasoning, explains its novel Trace Inversion, high‑quality trajectory data, and Curriculum SFT training, details MTP acceleration, benchmark results, quantization options, and step‑by‑step local deployment instructions.

AgentGGUFMTP

0 likes · 10 min read

Opus‑Distilled Qwen3.5‑Coder Scores 100/100 Tool Calls, 1.4‑2.2× Faster with MTP, 128K Context on Consumer GPU

Machine Heart

May 30, 2026 · Artificial Intelligence

Syll: Open‑Source Multimodal AI Agent Framework for Secure, Trustworthy Automation

Current personal AI agents suffer from fragmented interfaces, high teaching barriers, opaque execution, and privacy concerns; Syll, an open‑source multimodal full‑interaction framework from Tsinghua and Jijiayi, unifies GUI, CLI, and MCP/API control, offers teach‑once skill generation, full audit trails, and a modular local architecture for secure, extensible automation.

Multimodal AIOpen Sourcedesktop automation

0 likes · 8 min read

Syll: Open‑Source Multimodal AI Agent Framework for Secure, Trustworthy Automation

Old Zhang's AI Learning

May 23, 2026 · Artificial Intelligence

The Underrated Lifesaving Template for Qwen Local Deployment

This article analyzes the hidden pitfalls of Qwen's official Jinja chat template, explains how the community‑maintained Qwen‑Fixed‑Chat‑Templates v19 fixes rendering errors, KV‑Cache loss, token waste and agent dead‑locks, and provides step‑by‑step installation instructions for LM Studio, llama.cpp, vLLM and MLX.

Agent LoopChat TemplateKV cache

0 likes · 10 min read

The Underrated Lifesaving Template for Qwen Local Deployment

Old Zhang's AI Learning

May 14, 2026 · Artificial Intelligence

Boost Qwen3.6 with MTP: 1.5× Faster Local Deployment for Claude Code

The article explains how to enable Multi‑Token Prediction (MTP) in Qwen3.6 using a specific llama.cpp PR, achieving up to 1.5× faster local inference, details compilation steps, optimal parameters, memory requirements, and how to integrate the accelerated model with Claude Code while avoiding common pitfalls.

Claude CodeLLM AccelerationMTP

0 likes · 11 min read

Boost Qwen3.6 with MTP: 1.5× Faster Local Deployment for Claude Code

Geek Labs

May 9, 2026 · Backend Development

How to Run Claude Code Locally for Free with the Open‑Source Free Claude Code Proxy

This guide introduces the open‑source Free Claude Code project, explains its FastAPI‑based proxy architecture that routes Claude Code requests to various backends such as NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama, and provides step‑by‑step instructions for installation, configuration, and deployment on local machines.

AI assistantClaude CodeFastAPI

0 likes · 6 min read

How to Run Claude Code Locally for Free with the Open‑Source Free Claude Code Proxy

Old Zhang's AI Learning

Apr 22, 2026 · Artificial Intelligence

Qwen3.6-27B Open‑Source: How a 27B Dense Model Outperforms the 397B Giant

The newly released Qwen3.6-27B dense multimodal model, at just 27 B parameters, surpasses the 397 B flagship on most encoding benchmarks, offers up to 1 M token context, supports FP8 quantization, and can be deployed locally via vLLM, SGLang or Transformers with modest hardware.

27BDense ModelFP8

0 likes · 12 min read

Qwen3.6-27B Open‑Source: How a 27B Dense Model Outperforms the 397B Giant

Old Zhang's AI Learning

Apr 12, 2026 · Artificial Intelligence

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

This guide explains the 22 GGUF quantized versions of MiniMax-M2.7 released by Unsloth, compares their accuracy and size, recommends the UD‑Q4_K_XL model for best quality‑to‑size trade‑off, and provides step‑by‑step instructions for local deployment via Unsloth Studio, llama.cpp, API server, or the MLX native solution, along with important pitfalls and performance‑tuning tips.

Dynamic 2.0MLXMiniMax M2.7

0 likes · 14 min read

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

Old Zhang's AI Learning

Apr 4, 2026 · Artificial Intelligence

Deploy Gemma 4 Locally: Ollama, llama.cpp, MLX, vLLM + TurboQuant Optimization

The article reviews the four Gemma 4 model variants, analyzes their architecture and benchmark results versus Qwen3.5, and provides step‑by‑step instructions for local deployment using Ollama, llama.cpp, MLX and vLLM, while highlighting TurboQuant memory and weight compression techniques.

AI benchmarkingGemma 4MLX

0 likes · 15 min read

Deploy Gemma 4 Locally: Ollama, llama.cpp, MLX, vLLM + TurboQuant Optimization

Old Zhang's AI Learning

Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval

0 likes · 12 min read

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

macrozheng

Mar 16, 2026 · Artificial Intelligence

How LLMFit Automates Hardware Compatibility Checks for Local Large‑Model Deployment

LLMFit, a Rust‑based terminal tool, automatically detects system hardware, recommends optimal quantization levels, and scores models across multiple dimensions, enabling developers to quickly identify and run large language models that suit their machines without trial‑and‑error.

CLI toolLLMModel Quantization

0 likes · 5 min read

How LLMFit Automates Hardware Compatibility Checks for Local Large‑Model Deployment

Old Zhang's AI Learning

Mar 10, 2026 · Artificial Intelligence

Install AutoClaw in One Minute: Quick Setup for a Local AI Assistant

AutoClaw wraps the open‑source OpenClaw client, turning a half‑day installation into three simple steps—download, install, and auto‑configure—while adding seamless Feishu integration, support for GLM‑5 and pony‑alpha‑2 models, built‑in skills, and security recommendations for custom skill creation.

AI assistantAutoClawFeishu

0 likes · 6 min read

Install AutoClaw in One Minute: Quick Setup for a Local AI Assistant

Old Zhang's AI Learning

Mar 3, 2026 · Artificial Intelligence

How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally

This guide walks you through deploying Qwen3.5's 0.8B, 2B, 4B and 9B models on CPUs or modest GPUs using Unsloth's GGUF quantization, explains hardware requirements, shows how to run them with llama.cpp, llama‑server, vLLM or SGLang, and provides a free Colab fine‑tuning workflow with export options.

AI modelsGGUFQwen3.5

0 likes · 19 min read

How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally

Old Zhang's AI Learning

Mar 2, 2026 · Artificial Intelligence

Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities

The article introduces the newly released Qwen3.5 small model series (0.8B, 2B, 4B, 9B), explains their shared Gated Delta Networks architecture, early multimodal token fusion, 201‑language support and up to 1 million‑token context, and presents benchmark data that show the 9B model rivaling much larger LLMs, followed by practical guidance on model selection and deployment.

Gated Delta NetworksMultimodalQwen3.5

0 likes · 10 min read

Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities

Old Zhang's AI Learning

Mar 2, 2026 · Artificial Intelligence

Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice

The author reviews the Qwen3.5 model family, showing that the 27‑billion‑parameter dense Qwen3.5-27B offers the best balance of size, stability, low‑cost local deployment, and comprehensive capabilities, making it the default pick for most users.

AI benchmarkingLarge Language ModelQuantization

0 likes · 6 min read

Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice

Old Zhang's AI Learning

Feb 26, 2026 · Artificial Intelligence

Ultimate Guide to Local Deployment of Qwen3.5 Models (27B‑397B)

This guide reviews the Qwen3.5 model lineup, explains mixed‑inference and MoE architecture, presents benchmark comparisons with GPT‑5.2, Claude 4.5 and Gemini‑3 Pro, evaluates 4‑bit and 3‑bit quantization loss, outlines hardware requirements, and provides step‑by‑step deployment options using llama.cpp or llama‑server.

Large Language ModelMoEQuantization

0 likes · 14 min read

Ultimate Guide to Local Deployment of Qwen3.5 Models (27B‑397B)

AI Code to Success

Feb 24, 2026 · Artificial Intelligence

Why “Claw” Is the New Layer of AI Agents and What It Means for Software Development

The article analyzes Andrej Karpathy’s introduction of “Claw” as a new AI‑agent layer, explains its architecture, rapid industry adoption, the shift toward mutable codebases, local deployment benefits, and how these trends reshape software engineering principles in the AI era.

Claw architectureMAML analogyOpenClaw

0 likes · 9 min read

Why “Claw” Is the New Layer of AI Agents and What It Means for Software Development

Old Zhang's AI Learning

Feb 23, 2026 · Artificial Intelligence

One-Click Tool to Determine Which Large Language Models Your PC Can Run Locally

The llmfit command‑line utility scans your CPU, RAM, GPU and VRAM, scores 157 models from over 30 providers, suggests the highest‑quality quantized version that fits, integrates with Ollama, and shows real‑world test results confirming its accuracy, though its model database is limited.

Large Language ModelsMixture of ExpertsOllama

0 likes · 6 min read

One-Click Tool to Determine Which Large Language Models Your PC Can Run Locally

Old Zhang's AI Learning

Jan 29, 2026 · Artificial Intelligence

Deploying GLM‑4.7‑Flash Quantized Model Locally on a Single RTX 4090

This guide walks through downloading the AWQ‑4bit quantized GLM‑4.7‑Flash model, upgrading vLLM, building a custom Docker image, and launching the model on two RTX 4090 GPUs with tuned parameters to avoid OOM, while sharing practical tips and observed performance.

AWQ-4bitDockerGLM-4.7-Flash

0 likes · 7 min read

Deploying GLM‑4.7‑Flash Quantized Model Locally on a Single RTX 4090

Programmer's Advance

Jan 21, 2026 · Artificial Intelligence

Why GLM‑4.7‑Flash Delivers 70B‑Level Performance with Only 30B Parameters

GLM‑4.7‑Flash, released by Zhipu AI on Jan 20 2026, uses a Mixture‑of‑Experts (MoE) backbone and a Multi‑Latent Attention (MLA) mechanism to achieve near‑70B model quality with just 30 B total and 3 B active parameters, running on a single 24 GB GPU or even a Mac, while remaining fully open‑source and free to use.

AI model benchmarkGLM-4.7-FlashMixture of Experts

0 likes · 15 min read

Why GLM‑4.7‑Flash Delivers 70B‑Level Performance with Only 30B Parameters

Fun with Large Models

Jan 18, 2026 · Artificial Intelligence

Step‑by‑Step Guide to Deploying Large Language Models Locally with VLLM and Ollama

This article walks through two mainstream local deployment solutions—high‑performance VLLM for production Linux servers and lightweight Ollama for personal Windows machines—covering environment setup, model download, server launch, API testing, key configuration parameters, and the quantization technique that makes Ollama models compact.

GPU OptimizationLarge Language ModelsModel Quantization

0 likes · 18 min read

Step‑by‑Step Guide to Deploying Large Language Models Locally with VLLM and Ollama

Rare Earth Juejin Tech Community

Oct 31, 2025 · Artificial Intelligence

Build a Private AI Knowledge Base with Ollama and FastGPT

This guide walks you through setting up a locally deployed AI system using Ollama and FastGPT, covering model selection, Docker deployment, configuration, knowledge‑base creation, and testing so your team can query internal documents securely and efficiently.

AIDockerFastGPT

0 likes · 25 min read

Build a Private AI Knowledge Base with Ollama and FastGPT

Eric Tech Circle

Sep 10, 2025 · Artificial Intelligence

Deploy High‑Performance Local LLMs with vLLM: A Step‑by‑Step Guide

This article walks through installing and configuring vLLM for local large language model inference, compares it with Ollama and LM Studio, details environment setup, model download, testing scripts, and shows how to expose an OpenAI‑compatible API for production use.

Inference OptimizationLarge Language ModelModelScope

0 likes · 11 min read

Deploy High‑Performance Local LLMs with vLLM: A Step‑by‑Step Guide

Dunmao Tech Hub

Sep 1, 2025 · Artificial Intelligence

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

This guide walks you through a Bash script that automatically checks for Ollama, installs it if missing, lets you choose a DeepSeek‑r1 model size, starts the Ollama service, and runs the selected model locally, complete with usage examples and a token‑cost note.

AIDeepSeekModel Deployment

0 likes · 7 min read

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

Mingyi World Elasticsearch

Aug 4, 2025 · Artificial Intelligence

Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required

This article walks through the complete design and implementation of a locally deployed, enterprise‑level semantic search system using Ollama for embedding generation and Easysearch for vector retrieval, covering problem analysis, architecture decisions, pipeline configuration, bulk indexing, and hybrid query execution.

EasysearchOllamaSearch Engine

0 likes · 12 min read

Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required

Eric Tech Circle

Aug 3, 2025 · Artificial Intelligence

How to Deploy Qwen3‑Coder Locally and Boost Front‑End Development

This article explains the key improvements of Qwen3‑Coder, walks through two local deployment methods (LM Studio and Ollama), showcases front‑end coding examples, compares performance and hardware requirements, and offers practical recommendations for developers seeking an on‑premise AI coding assistant.

AI code generationLM StudioOllama

0 likes · 7 min read

How to Deploy Qwen3‑Coder Locally and Boost Front‑End Development

Full-Stack Cultivation Path

Jul 26, 2025 · Artificial Intelligence

Step-by-Step Local Deployment Guide for Coze Studio: Launch Your Low-Code AI Agent Development

This article provides a comprehensive, hands‑on tutorial for installing Ollama, Docker, and the open‑source Coze Studio on a local machine, configuring various LLM services such as Qwen 3, DeepSeek‑V3, and OpenRouter, and running the platform via Docker Compose to create and test AI agents.

Coze StudioDockerLLM

0 likes · 7 min read

Step-by-Step Local Deployment Guide for Coze Studio: Launch Your Low-Code AI Agent Development

Mingyi World Elasticsearch

Jul 23, 2025 · Artificial Intelligence

How to Build an Enhanced RAG Retrieval and AI Assistant for Youdao Cloud Notes

This article walks through retrieving ten years of Youdao Cloud Notes, selecting a RAG implementation (self‑built or using Coco AI), configuring cookies, loading the notes locally, and integrating a large language model to enable full‑text search and intelligent question‑answering.

AI assistantCoco AIRAG

0 likes · 8 min read

How to Build an Enhanced RAG Retrieval and AI Assistant for Youdao Cloud Notes

php Courses

May 22, 2025 · Backend Development

Building an Offline PHP API Self‑Service Terminal

This article explains how to design and implement a fully offline self‑service terminal using PHP, covering the reasons for a local solution, three‑tier architecture, UI, API layer, data storage options, security, performance optimizations, deployment strategies, and real‑world use cases.

Backend DevelopmentOffline APIPHP

0 likes · 8 min read

Building an Offline PHP API Self‑Service Terminal

Java Architecture Diary

May 19, 2025 · Artificial Intelligence

How Ollama 0.7 Unlocks Local Multimodal AI with One Command

Ollama 0.7 introduces a fully re‑engineered core that brings seamless multimodal model support, lists top visual models, showcases OCR and image analysis capabilities, explains technical breakthroughs, and provides a quick three‑step guide to deploy powerful local AI vision.

AI EngineeringAI modelsOllama

0 likes · 7 min read

How Ollama 0.7 Unlocks Local Multimodal AI with One Command

Eric Tech Circle

May 6, 2025 · Artificial Intelligence

How to Deploy Qwen3-30B-A3B Locally and Unlock Its Full AI Potential

This article walks through the complete process of installing the Qwen3-30B-A3B large language model on a personal computer using LM Studio, evaluates its reasoning, creative, multilingual, and coding abilities with detailed prompts, and shares practical tips for optimizing local deployment and prompt design.

AI evaluationLM StudioPrompt Engineering

0 likes · 12 min read

How to Deploy Qwen3-30B-A3B Locally and Unlock Its Full AI Potential

Open Source Linux

Apr 14, 2025 · Artificial Intelligence

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

This guide compares DeepSeek’s local and online versions, outlines hardware and privacy advantages of offline deployment, and provides a detailed step‑by‑step tutorial—including Ollama installation, model selection, command execution, and UI plugin setup—to help users run DeepSeek on their own machines.

AI modelDeepSeekOllama

0 likes · 6 min read

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

Architect's Alchemy Furnace

Mar 27, 2025 · Artificial Intelligence

Xinference vs Ollama: Which Open‑Source LLM Engine Fits Your Needs?

This article provides a comprehensive side‑by‑side comparison of the open‑source LLM serving tools Xinference and Ollama, examining their core goals, architecture, model support, deployment options, performance, ecosystem integration, typical use cases, future roadmap, and guidance on selecting the right solution for enterprise or personal projects.

ComparisonLLMOpen Source

0 likes · 7 min read

Xinference vs Ollama: Which Open‑Source LLM Engine Fits Your Needs?

Java Architecture Diary

Mar 19, 2025 · Artificial Intelligence

Unlocking Google’s Gemma 3: Multimodal Power, 128k Context & Local Deployment Guide

This article introduces Google’s open‑source Gemma 3 model, highlighting its multimodal capabilities, massive 128k token context window, multilingual support, and provides step‑by‑step instructions for installing Ollama, pulling the model, and running local tests with code examples.

AI modelGemma 3Large Language Model

0 likes · 7 min read

Unlocking Google’s Gemma 3: Multimodal Power, 128k Context & Local Deployment Guide

AI Product Manager Community

Mar 8, 2025 · Artificial Intelligence

Deploy OpenManus Locally and Let It Generate a Complete WeChat Mini‑Program

This article walks through installing OpenManus locally using Python 3.12, cloning its GitHub repository, configuring DeepSeek LLM credentials, launching the service, and prompting the agent to generate a full WeChat mini‑program, while sharing observations on performance, token cost, and limitations.

AI AgentDeepSeekLLM

0 likes · 5 min read

Deploy OpenManus Locally and Let It Generate a Complete WeChat Mini‑Program

Java Architect Essentials

Mar 2, 2025 · Artificial Intelligence

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

This guide explains why DeepSeek is a compelling GPT‑4‑level alternative, provides hardware recommendations for various model sizes, and walks through a three‑step Windows deployment using Ollama, including installation, environment configuration, model download, performance tuning, and common troubleshooting tips.

AIDeepSeekGPU

0 likes · 8 min read

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

Efficient Ops

Feb 25, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step Guide for AI Enthusiasts

This guide explains what DeepSeek R1 is, compares its full and distilled versions, details hardware requirements for Linux, Windows, and macOS, and provides step‑by‑step instructions for local deployment using Ollama, LM Studio, Docker, and visual interfaces like Open‑WebUI and Dify.

AI modelDeepSeekDify

0 likes · 9 min read

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step Guide for AI Enthusiasts

Tencent Cloud Developer

Feb 25, 2025 · Artificial Intelligence

Deploy DeepSeek AI: Cloud, Local, API – Full Step‑by‑Step Guide

This guide walks developers through the full lifecycle of using DeepSeek—choosing the right deployment method (API, local machine, or private cloud), selecting model sizes based on hardware, configuring Tencent Cloud services, building AI applications, and integrating the model into development tools and mini‑programs.

AI application developmentAI model deploymentAPI integration

0 likes · 12 min read

Deploy DeepSeek AI: Cloud, Local, API – Full Step‑by‑Step Guide

Top Architect

Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekDynamic Quantization

0 likes · 16 min read

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

Architects' Tech Alliance

Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekKnowledge Distillation

0 likes · 10 min read

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

Cognitive Technology Team

Feb 12, 2025 · Artificial Intelligence

Step-by-Step Guide to Deploy DeepSeek AI Locally on macOS with Ollama and Chatbox AI

This article provides a comprehensive tutorial on installing Ollama, downloading and running the DeepSeek‑R1 model on a Mac, explains the benefits of local deployment for stability and privacy, and shows how to integrate the model with the Chatbox AI visual interface.

AIDeepSeekOllama

0 likes · 5 min read

Step-by-Step Guide to Deploy DeepSeek AI Locally on macOS with Ollama and Chatbox AI

MaGe Linux Operations

Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker

0 likes · 15 min read

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

Java One

Feb 6, 2025 · Artificial Intelligence

Deploy DeepSeek‑R1 Locally on Your Laptop in Just 3 Minutes

This step‑by‑step guide shows non‑technical users how to install Ollama, pull the desired DeepSeek‑R1 model version, run it from the terminal, and optionally connect the free Chatbox desktop client for a visual chat interface, all without external network dependencies.

AI modelChatboxDeepSeek

0 likes · 6 min read

Deploy DeepSeek‑R1 Locally on Your Laptop in Just 3 Minutes

Top Architect

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU

0 likes · 14 min read

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

Architect's Alchemy Furnace

Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

This article provides a comprehensive guide to locally deploying DeepSeek R1 models using Ollama on Windows and Linux, covering model variants, hardware requirements, installation steps, command‑line operations, visual client options, usage examples, performance tuning, and best‑practice recommendations for developers and enterprises.

AI modelDeepSeekDocker

0 likes · 10 min read

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

Code Mala Tang

Feb 2, 2025 · Artificial Intelligence

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

This guide walks you through the hardware and software prerequisites, Docker-based installation, environment configuration, model fine‑tuning, IDE integration, maintenance, and troubleshooting for running the DeepSeek AI programming assistant entirely on your own machine.

AI coding assistantDeepSeekDocker

0 likes · 12 min read

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

21CTO

Jul 7, 2024 · Artificial Intelligence

How to Build a Secure Local LLM Chatbot with Ollama, Python, and ChromaDB

This tutorial walks you through creating a privacy‑preserving, locally hosted large language model chatbot using Ollama, Python 3, and ChromaDB, covering RAG fundamentals, GPU selection, environment setup, and full source code for a Flask‑based application.

ChromaDBLLMOllama

0 likes · 19 min read

How to Build a Secure Local LLM Chatbot with Ollama, Python, and ChromaDB

21CTO

Apr 22, 2024 · Artificial Intelligence

Run Llama 3 Locally on PC/Mac: Ollama, LM Studio & GPT4All Guide

This guide walks you through three practical methods—using Ollama, LM Studio, and GPT4All—to install and run the open‑source Llama 3 model locally on Windows, macOS, or Ubuntu, including command‑line usage, Python integration, and prompt‑engineering techniques for formatted outputs.

GPT4AllLM StudioLlama3

0 likes · 5 min read

Run Llama 3 Locally on PC/Mac: Ollama, LM Studio & GPT4All Guide

Rare Earth Juejin Tech Community

Apr 1, 2024 · Artificial Intelligence

Deploying and Using Ollama Large Language Models Locally with Streamlit

This guide explains how to install Ollama, explore its supported open‑source LLMs, use its REST API for generation, chat, and embeddings, and build a Streamlit‑based web chat application that runs locally on your machine.

AIOllamaPython

0 likes · 6 min read

Deploying and Using Ollama Large Language Models Locally with Streamlit

Ant R&D Efficiency

Sep 25, 2023 · Artificial Intelligence

Running LLaMA 7B Model Locally on a Single Machine

This guide shows how to download, convert, 4‑bit quantize, and run Meta’s 7‑billion‑parameter LLaMA model on a single 16‑inch Apple laptop using Python, torch, and the llama.cpp repository, demonstrating that the quantized model fits in memory and generates responses quickly, with optional scaling to larger models.

7B modelAILLaMA

0 likes · 5 min read

Running LLaMA 7B Model Locally on a Single Machine

WeiLi Technology Team

May 8, 2023 · Artificial Intelligence

How to Run GPT‑2 Locally: Complete Setup and Code Adjustments

This guide explains the GPT‑2 background, required software, environment configuration, code modifications for TensorFlow 2.x, data download, execution commands, and sample test results, providing a full step‑by‑step process for local deployment of the model.

AIGPT-2TensorFlow

0 likes · 7 min read

How to Run GPT‑2 Locally: Complete Setup and Code Adjustments