Tagged articles
47 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
May 14, 2026 · Artificial Intelligence

Boost Qwen3.6 with MTP: 1.5× Faster Local Deployment for Claude Code

The article explains how to enable Multi‑Token Prediction (MTP) in Qwen3.6 using a specific llama.cpp PR, achieving up to 1.5× faster local inference, details compilation steps, optimal parameters, memory requirements, and how to integrate the accelerated model with Claude Code while avoiding common pitfalls.

Claude CodeLLM accelerationMTP
0 likes · 11 min read
Boost Qwen3.6 with MTP: 1.5× Faster Local Deployment for Claude Code
Geek Labs
Geek Labs
May 9, 2026 · Backend Development

How to Run Claude Code Locally for Free with the Open‑Source Free Claude Code Proxy

This guide introduces the open‑source Free Claude Code project, explains its FastAPI‑based proxy architecture that routes Claude Code requests to various backends such as NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama, and provides step‑by‑step instructions for installation, configuration, and deployment on local machines.

AI AssistantClaude CodeFastAPI
0 likes · 6 min read
How to Run Claude Code Locally for Free with the Open‑Source Free Claude Code Proxy
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 12, 2026 · Artificial Intelligence

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

This guide explains the 22 GGUF quantized versions of MiniMax-M2.7 released by Unsloth, compares their accuracy and size, recommends the UD‑Q4_K_XL model for best quality‑to‑size trade‑off, and provides step‑by‑step instructions for local deployment via Unsloth Studio, llama.cpp, API server, or the MLX native solution, along with important pitfalls and performance‑tuning tips.

Dynamic 2.0MLXMiniMax M2.7
0 likes · 14 min read
How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval
0 likes · 12 min read
Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 10, 2026 · Artificial Intelligence

Install AutoClaw in One Minute: Quick Setup for a Local AI Assistant

AutoClaw wraps the open‑source OpenClaw client, turning a half‑day installation into three simple steps—download, install, and auto‑configure—while adding seamless Feishu integration, support for GLM‑5 and pony‑alpha‑2 models, built‑in skills, and security recommendations for custom skill creation.

AI AssistantAutoClawFeishu
0 likes · 6 min read
Install AutoClaw in One Minute: Quick Setup for a Local AI Assistant
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 3, 2026 · Artificial Intelligence

How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally

This guide walks you through deploying Qwen3.5's 0.8B, 2B, 4B and 9B models on CPUs or modest GPUs using Unsloth's GGUF quantization, explains hardware requirements, shows how to run them with llama.cpp, llama‑server, vLLM or SGLang, and provides a free Colab fine‑tuning workflow with export options.

AI modelsFine-tuningGGUF
0 likes · 19 min read
How to Deploy and Fine‑Tune Qwen3.5 Small Models (0.8B‑9B) Locally
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 2, 2026 · Artificial Intelligence

Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities

The article introduces the newly released Qwen3.5 small model series (0.8B, 2B, 4B, 9B), explains their shared Gated Delta Networks architecture, early multimodal token fusion, 201‑language support and up to 1 million‑token context, and presents benchmark data that show the 9B model rivaling much larger LLMs, followed by practical guidance on model selection and deployment.

BenchmarkGated Delta Networkslocal deployment
0 likes · 10 min read
Qwen3.5 Small Models Unveiled: From 0.8B to 9B with Full Capabilities
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 2, 2026 · Artificial Intelligence

Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice

The author reviews the Qwen3.5 model family, showing that the 27‑billion‑parameter dense Qwen3.5-27B offers the best balance of size, stability, low‑cost local deployment, and comprehensive capabilities, making it the default pick for most users.

AI benchmarkingRTX 4090large language model
0 likes · 6 min read
Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 26, 2026 · Artificial Intelligence

Ultimate Guide to Local Deployment of Qwen3.5 Models (27B‑397B)

This guide reviews the Qwen3.5 model lineup, explains mixed‑inference and MoE architecture, presents benchmark comparisons with GPT‑5.2, Claude 4.5 and Gemini‑3 Pro, evaluates 4‑bit and 3‑bit quantization loss, outlines hardware requirements, and provides step‑by‑step deployment options using llama.cpp or llama‑server.

InferenceMoElarge language model
0 likes · 14 min read
Ultimate Guide to Local Deployment of Qwen3.5 Models (27B‑397B)
Programmer's Advance
Programmer's Advance
Jan 21, 2026 · Artificial Intelligence

Why GLM‑4.7‑Flash Delivers 70B‑Level Performance with Only 30B Parameters

GLM‑4.7‑Flash, released by Zhipu AI on Jan 20 2026, uses a Mixture‑of‑Experts (MoE) backbone and a Multi‑Latent Attention (MLA) mechanism to achieve near‑70B model quality with just 30 B total and 3 B active parameters, running on a single 24 GB GPU or even a Mac, while remaining fully open‑source and free to use.

AI model benchmarkGLM-4.7-FlashMixture of Experts
0 likes · 15 min read
Why GLM‑4.7‑Flash Delivers 70B‑Level Performance with Only 30B Parameters
Fun with Large Models
Fun with Large Models
Jan 18, 2026 · Artificial Intelligence

Step‑by‑Step Guide to Deploying Large Language Models Locally with VLLM and Ollama

This article walks through two mainstream local deployment solutions—high‑performance VLLM for production Linux servers and lightweight Ollama for personal Windows machines—covering environment setup, model download, server launch, API testing, key configuration parameters, and the quantization technique that makes Ollama models compact.

GPU OptimizationModel QuantizationOllama
0 likes · 18 min read
Step‑by‑Step Guide to Deploying Large Language Models Locally with VLLM and Ollama
Eric Tech Circle
Eric Tech Circle
Sep 10, 2025 · Artificial Intelligence

Deploy High‑Performance Local LLMs with vLLM: A Step‑by‑Step Guide

This article walks through installing and configuring vLLM for local large language model inference, compares it with Ollama and LM Studio, details environment setup, model download, testing scripts, and shows how to expose an OpenAI‑compatible API for production use.

Inference OptimizationModelScopeOpenAI API
0 likes · 11 min read
Deploy High‑Performance Local LLMs with vLLM: A Step‑by‑Step Guide
Dunmao Tech Hub
Dunmao Tech Hub
Sep 1, 2025 · Artificial Intelligence

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

This guide walks you through a Bash script that automatically checks for Ollama, installs it if missing, lets you choose a DeepSeek‑r1 model size, starts the Ollama service, and runs the selected model locally, complete with usage examples and a token‑cost note.

AIDeepSeekModel Deployment
0 likes · 7 min read
Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Aug 4, 2025 · Artificial Intelligence

Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required

This article walks through the complete design and implementation of a locally deployed, enterprise‑level semantic search system using Ollama for embedding generation and Easysearch for vector retrieval, covering problem analysis, architecture decisions, pipeline configuration, bulk indexing, and hybrid query execution.

EasysearchOllamalocal deployment
0 likes · 12 min read
Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required
Eric Tech Circle
Eric Tech Circle
Aug 3, 2025 · Artificial Intelligence

How to Deploy Qwen3‑Coder Locally and Boost Front‑End Development

This article explains the key improvements of Qwen3‑Coder, walks through two local deployment methods (LM Studio and Ollama), showcases front‑end coding examples, compares performance and hardware requirements, and offers practical recommendations for developers seeking an on‑premise AI coding assistant.

AI code generationLM StudioOllama
0 likes · 7 min read
How to Deploy Qwen3‑Coder Locally and Boost Front‑End Development
php Courses
php Courses
May 22, 2025 · Backend Development

Building an Offline PHP API Self‑Service Terminal

This article explains how to design and implement a fully offline self‑service terminal using PHP, covering the reasons for a local solution, three‑tier architecture, UI, API layer, data storage options, security, performance optimizations, deployment strategies, and real‑world use cases.

Backend DevelopmentEdge ComputingOffline API
0 likes · 8 min read
Building an Offline PHP API Self‑Service Terminal
Java Architecture Diary
Java Architecture Diary
May 19, 2025 · Artificial Intelligence

How Ollama 0.7 Unlocks Local Multimodal AI with One Command

Ollama 0.7 introduces a fully re‑engineered core that brings seamless multimodal model support, lists top visual models, showcases OCR and image analysis capabilities, explains technical breakthroughs, and provides a quick three‑step guide to deploy powerful local AI vision.

AI EngineeringAI modelsOllama
0 likes · 7 min read
How Ollama 0.7 Unlocks Local Multimodal AI with One Command
Eric Tech Circle
Eric Tech Circle
May 6, 2025 · Artificial Intelligence

How to Deploy Qwen3-30B-A3B Locally and Unlock Its Full AI Potential

This article walks through the complete process of installing the Qwen3-30B-A3B large language model on a personal computer using LM Studio, evaluates its reasoning, creative, multilingual, and coding abilities with detailed prompts, and shares practical tips for optimizing local deployment and prompt design.

AI EvaluationLM StudioPrompt engineering
0 likes · 12 min read
How to Deploy Qwen3-30B-A3B Locally and Unlock Its Full AI Potential
Open Source Linux
Open Source Linux
Apr 14, 2025 · Artificial Intelligence

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

This guide compares DeepSeek’s local and online versions, outlines hardware and privacy advantages of offline deployment, and provides a detailed step‑by‑step tutorial—including Ollama installation, model selection, command execution, and UI plugin setup—to help users run DeepSeek on their own machines.

AI modelDeepSeekOllama
0 likes · 6 min read
How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Mar 27, 2025 · Artificial Intelligence

Xinference vs Ollama: Which Open‑Source LLM Engine Fits Your Needs?

This article provides a comprehensive side‑by‑side comparison of the open‑source LLM serving tools Xinference and Ollama, examining their core goals, architecture, model support, deployment options, performance, ecosystem integration, typical use cases, future roadmap, and guidance on selecting the right solution for enterprise or personal projects.

ComparisonLLMModel Serving
0 likes · 7 min read
Xinference vs Ollama: Which Open‑Source LLM Engine Fits Your Needs?
Efficient Ops
Efficient Ops
Feb 25, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step Guide for AI Enthusiasts

This guide explains what DeepSeek R1 is, compares its full and distilled versions, details hardware requirements for Linux, Windows, and macOS, and provides step‑by‑step instructions for local deployment using Ollama, LM Studio, Docker, and visual interfaces like Open‑WebUI and Dify.

AI modelDeepSeekDify
0 likes · 9 min read
How to Deploy DeepSeek R1 Locally: A Step‑by‑Step Guide for AI Enthusiasts
Tencent Cloud Developer
Tencent Cloud Developer
Feb 25, 2025 · Artificial Intelligence

Deploy DeepSeek AI: Cloud, Local, API – Full Step‑by‑Step Guide

This guide walks developers through the full lifecycle of using DeepSeek—choosing the right deployment method (API, local machine, or private cloud), selecting model sizes based on hardware, configuring Tencent Cloud services, building AI applications, and integrating the model into development tools and mini‑programs.

AI Model DeploymentAI application developmentCloud Native
0 likes · 12 min read
Deploy DeepSeek AI: Cloud, Local, API – Full Step‑by‑Step Guide
Top Architect
Top Architect
Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekDynamic Quantization
0 likes · 16 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekLLM
0 likes · 10 min read
How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment
MaGe Linux Operations
MaGe Linux Operations
Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker
0 likes · 15 min read
How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide
Java One
Java One
Feb 6, 2025 · Artificial Intelligence

Deploy DeepSeek‑R1 Locally on Your Laptop in Just 3 Minutes

This step‑by‑step guide shows non‑technical users how to install Ollama, pull the desired DeepSeek‑R1 model version, run it from the terminal, and optionally connect the free Chatbox desktop client for a visual chat interface, all without external network dependencies.

AI modelChatboxDeepSeek
0 likes · 6 min read
Deploy DeepSeek‑R1 Locally on Your Laptop in Just 3 Minutes
Top Architect
Top Architect
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU
0 likes · 14 min read
Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

This article provides a comprehensive guide to locally deploying DeepSeek R1 models using Ollama on Windows and Linux, covering model variants, hardware requirements, installation steps, command‑line operations, visual client options, usage examples, performance tuning, and best‑practice recommendations for developers and enterprises.

AI modelDeepSeekDocker
0 likes · 10 min read
Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux
Code Mala Tang
Code Mala Tang
Feb 2, 2025 · Artificial Intelligence

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

This guide walks you through the hardware and software prerequisites, Docker-based installation, environment configuration, model fine‑tuning, IDE integration, maintenance, and troubleshooting for running the DeepSeek AI programming assistant entirely on your own machine.

AI coding assistantDeepSeekDocker
0 likes · 12 min read
How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide
21CTO
21CTO
Jul 7, 2024 · Artificial Intelligence

How to Build a Secure Local LLM Chatbot with Ollama, Python, and ChromaDB

This tutorial walks you through creating a privacy‑preserving, locally hosted large language model chatbot using Ollama, Python 3, and ChromaDB, covering RAG fundamentals, GPU selection, environment setup, and full source code for a Flask‑based application.

ChromaDBLLMOllama
0 likes · 19 min read
How to Build a Secure Local LLM Chatbot with Ollama, Python, and ChromaDB
21CTO
21CTO
Apr 22, 2024 · Artificial Intelligence

Run Llama 3 Locally on PC/Mac: Ollama, LM Studio & GPT4All Guide

This guide walks you through three practical methods—using Ollama, LM Studio, and GPT4All—to install and run the open‑source Llama 3 model locally on Windows, macOS, or Ubuntu, including command‑line usage, Python integration, and prompt‑engineering techniques for formatted outputs.

GPT4AllLM StudioLlama3
0 likes · 5 min read
Run Llama 3 Locally on PC/Mac: Ollama, LM Studio & GPT4All Guide
Ant R&D Efficiency
Ant R&D Efficiency
Sep 25, 2023 · Artificial Intelligence

Running LLaMA 7B Model Locally on a Single Machine

This guide shows how to download, convert, 4‑bit quantize, and run Meta’s 7‑billion‑parameter LLaMA model on a single 16‑inch Apple laptop using Python, torch, and the llama.cpp repository, demonstrating that the quantized model fits in memory and generates responses quickly, with optional scaling to larger models.

7B modelAILLaMA
0 likes · 5 min read
Running LLaMA 7B Model Locally on a Single Machine
WeiLi Technology Team
WeiLi Technology Team
May 8, 2023 · Artificial Intelligence

How to Run GPT‑2 Locally: Complete Setup and Code Adjustments

This guide explains the GPT‑2 background, required software, environment configuration, code modifications for TensorFlow 2.x, data download, execution commands, and sample test results, providing a full step‑by‑step process for local deployment of the model.

AIGPT-2TensorFlow
0 likes · 7 min read
How to Run GPT‑2 Locally: Complete Setup and Code Adjustments