Tagged articles
21 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Feb 26, 2026 · Artificial Intelligence

How RAG Can Overcome Large‑Model Pitfalls in Enterprise Knowledge Work

This article explains the challenges large language models face in real‑world applications, introduces Retrieval‑Augmented Generation (RAG) as a solution, and details a modular RAG architecture, its components, and practical techniques for document parsing, query rewriting, hybrid retrieval, ranking, and answer generation in an enterprise setting.

Document ParsingLLM deploymentRAG
0 likes · 22 min read
How RAG Can Overcome Large‑Model Pitfalls in Enterprise Knowledge Work
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 17, 2026 · Artificial Intelligence

Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization

This article explains how to run the 397B Qwen3.5 model on a Mac by using Unsloth Dynamic 2.0 quantization (2‑bit, 3‑bit, or 4‑bit), outlines hardware requirements, provides compilation and download commands for llama.cpp, shows how to launch inference in thinking and non‑thinking modes, and compares several deployment options such as llama‑server, Transformers, SGLang/vLLM, and MLX.

Dynamic QuantizationGGUFLLM deployment
0 likes · 14 min read
Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization
Raymond Ops
Raymond Ops
Aug 26, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: Versions, Hardware, and UI Tools

This guide explains DeepSeek R1’s model variants, hardware requirements, local installation steps using Ollama, LM Studio or Docker, and how to add visual interfaces like Open‑WebUI and Dify for a complete on‑premise AI solution.

DeepSeekDifyHardware Requirements
0 likes · 14 min read
How to Deploy DeepSeek R1 Locally: Versions, Hardware, and UI Tools
Architects' Tech Alliance
Architects' Tech Alliance
Apr 13, 2025 · Artificial Intelligence

Deploying DeepSeek LLMs On-Premises: Step‑by‑Step Guide and Hardware Sizing

This article provides a comprehensive technical guide for privately deploying DeepSeek large language models, covering model and runtime parameter selection, hardware sizing calculations, software stack preparation, inference service setup, performance tuning, and security monitoring considerations.

AI hardware sizingDeepSeekInference Optimization
0 likes · 14 min read
Deploying DeepSeek LLMs On-Premises: Step‑by‑Step Guide and Hardware Sizing
Qborfy AI
Qborfy AI
Mar 27, 2025 · Artificial Intelligence

How to Deploy DeepSeek‑R1 Locally with Ollama and Dify: A Step‑by‑Step Guide

This article walks through the entire process of deploying the DeepSeek‑R1 large language model on a personal machine, covering hardware requirements, Ollama installation, model download, service startup, remote access configuration, and visual UI integration with Dify, complete with concrete commands and screenshots.

AIDeepSeekDocker
0 likes · 9 min read
How to Deploy DeepSeek‑R1 Locally with Ollama and Dify: A Step‑by‑Step Guide
AIWalker
AIWalker
Feb 27, 2025 · Artificial Intelligence

Step-by-Step Guide to Deploying, Testing, and Optimizing DeepSeek‑R1: A Complete Tutorial

This article provides a comprehensive, hands‑on guide for installing and configuring DeepSeek‑R1 with Ollama and vLLM, setting up multi‑node multi‑GPU environments, running performance benchmarks, optimizing runtime parameters, and even generating a full PyTorch distributed‑training script.

DeepSeek-R1Distributed TrainingGPU Optimization
0 likes · 39 min read
Step-by-Step Guide to Deploying, Testing, and Optimizing DeepSeek‑R1: A Complete Tutorial
Data Thinking Notes
Data Thinking Notes
Feb 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the 671‑billion‑parameter DeepSeek R1 model using Ollama, covering model selection, hardware requirements, dynamic quantization, detailed installation steps, performance observations, and practical recommendations for consumer‑grade hardware.

AI model optimizationDeepSeekDynamic Quantization
0 likes · 14 min read
How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide
21CTO
21CTO
Feb 16, 2025 · Artificial Intelligence

How to Deploy Your Own DeepSeek LLM Locally: Step-by-Step Guide

This guide walks you through setting up a local DeepSeek large language model, covering environment preparation, model acquisition, dependency installation, FastAPI service creation, Docker containerization, optional front‑end interface, performance tuning, and common troubleshooting steps.

AI modelDeepSeekDocker
0 likes · 7 min read
How to Deploy Your Own DeepSeek LLM Locally: Step-by-Step Guide
JD Cloud Developers
JD Cloud Developers
Feb 12, 2025 · Artificial Intelligence

Deploy a Private DeepSeek Large‑Model on JD Cloud with Ollama

This guide walks you through the reasons for deploying a private DeepSeek large‑model, compares full and distilled versions, shows how to purchase a JD Cloud computer, install Ollama, run the model, and integrate a local knowledge base using CherryStudio, Page Assist, and Anything LLM.

AI modelDeepSeekJD Cloud
0 likes · 17 min read
Deploy a Private DeepSeek Large‑Model on JD Cloud with Ollama
JD Tech Talk
JD Tech Talk
Feb 10, 2025 · Artificial Intelligence

Deploy DeepSeek on JD Cloud GPU and Chat with It via Ollama & Chatbox

This guide walks you through preparing a JD Cloud GPU instance, installing NVIDIA drivers, deploying Ollama, running the DeepSeek LLM (including model download and execution), configuring the Chatbox graphical client for interactive queries, and optionally feeding local documents into AnythingLLM for a private knowledge base.

AnythingLLMChatboxDeepSeek
0 likes · 17 min read
Deploy DeepSeek on JD Cloud GPU and Chat with It via Ollama & Chatbox
JavaEdge
JavaEdge
Nov 20, 2024 · Artificial Intelligence

7 Proven Strategies to Simplify Large Language Model Deployment

The article explains why deploying large language models is challenging and presents seven practical techniques—including defining deployment boundaries, model quantization, inference optimization, infrastructure consolidation, model replacement planning, GPU utilization, and using smaller models—to make LLM deployment more efficient and cost‑effective.

GPU OptimizationLLM deploymentModel Scaling
0 likes · 24 min read
7 Proven Strategies to Simplify Large Language Model Deployment
JavaEdge
JavaEdge
Oct 14, 2024 · Artificial Intelligence

Deploying LLMs with LangServe: A Complete Guide from Setup to Client Calls

This article introduces LangServe, explains its key features for LLM deployment, walks through environment setup, shows how to build a FastAPI‑based REST API with code examples, demonstrates testing via Postman and remote client calls, and summarizes its benefits for AI model serving.

AI model servingFastAPILLM deployment
0 likes · 9 min read
Deploying LLMs with LangServe: A Complete Guide from Setup to Client Calls
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Jul 3, 2024 · Artificial Intelligence

Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux

This guide walks you through deploying the ChatGLM3‑6B large language model locally, adding the M3E vector embedding model, setting up One‑API and FastGPT with Docker, configuring environments, fine‑tuning with LoRA, and testing the integrated knowledge‑base Q&A system.

ChatGLM3DockerFastGPT
0 likes · 15 min read
Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux
21CTO
21CTO
Apr 23, 2024 · Artificial Intelligence

Deploy Large Language Models with vLLM and Quantization for Low Latency

This guide explains how to deploy open‑source large language models using vLLM, benchmark latency and throughput, and apply 8‑bit/4‑bit quantization techniques such as BitsandBytes and NF4 to achieve faster inference on limited‑GPU hardware.

LLM deploymentPythonlarge language models
0 likes · 13 min read
Deploy Large Language Models with vLLM and Quantization for Low Latency
DataFunTalk
DataFunTalk
Jan 4, 2024 · Artificial Intelligence

Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

This presentation explains how OpenLLM, an open‑source LLM framework, together with BentoML, addresses the challenges of deploying large language models by offering model switching, memory optimizations, multi‑GPU support, observability, and easy containerized deployment for production AI applications.

AI OptimizationBentoMLLLM deployment
0 likes · 18 min read
Using OpenLLM to Quickly Build and Deploy Large Language Model Applications