Build a Free Private AI with DeepSeek, Ollama, and Local Knowledge Base

This guide explains how to locally deploy the open‑source DeepSeek model using Ollama, enhance interaction with Chatbox and Page Assist, and connect a local knowledge base via AnythingLLM's RAG architecture, providing step‑by‑step instructions, hardware requirements, and API examples for a self‑hosted AI system.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Build a Free Private AI with DeepSeek, Ollama, and Local Knowledge Base

Why Deploy Locally?

Running large language models on your own hardware offers several advantages: it is free, preserves data privacy, removes usage limits and content filters, eliminates network dependency, allows flexible customization, and can deliver higher performance and efficiency when sufficient resources are available.

DeepSeek Model Variants

DeepSeek provides a full‑size model (DeepSeek‑R1) with around 671 billion parameters, requiring at least 1 TB of RAM and dual H100 80 GB GPUs, which is impractical for most users. Distilled versions reduce the parameter count to 1.5 B–70 B, making them runnable on high‑end consumer hardware. Examples include DeepSeek‑R1‑Distill‑Qwen‑1.5B, ‑7B, ‑14B, ‑32B, and DeepSeek‑R1‑Distill‑Llama‑8B/70B.

Benchmarks show that the 32 B distilled model performs comparably to the full model on tasks such as AIME 2024 Pass@1 (72.6%).

Deploying DeepSeek with Ollama

Ollama is an open‑source framework for running LLMs locally. It simplifies installation via Docker or native packages and provides a unified API compatible with OpenAI endpoints.

Download the installer from https://ollama.com/download (macOS, Windows, Linux).

Verify the installation with ollama --version.

Ollama runs a local server on port 11434; models can be fetched with ollama run <model-name>.

The platform supports many models, including Llama, Qwen, Gemma, Mistral, Phi, GLM, CodeLlama, and LLaVA. DeepSeek is the most downloaded model on Ollama, with over 12 million pulls.

Chatbox + Ollama – Friendly Interaction

Chatbox is an open‑source desktop client that connects to Ollama’s API, offering a graphical chat interface without requiring command‑line usage. It auto‑detects installed Ollama models and lets you select the local endpoint (http://localhost:11434).

Page Assist – Adding Web Search to Local Models

Page Assist is a browser extension that provides a sidebar UI for local LLMs. It forwards queries to the Ollama server and can perform web searches, displaying results alongside model responses while keeping all processing local.

AnythingLLM + Ollama – Building a Local Knowledge Base (RAG)

AnythingLLM implements a Retrieval‑Augmented Generation (RAG) pipeline:

Embedding : Converts documents into vector representations.

Vector Database : Stores embeddings for fast similarity search (LanceDB is used by default).

LLM : Generates answers using retrieved context.

Installation steps:

Download the appropriate installer from the AnythingLLM website and run it.

Select Ollama as the model provider; the app automatically detects installed models.

Choose an embedding model (default local model) and LanceDB as the vector store.

Create a workspace, upload documents (TXT, Markdown, PDF, etc.), and optionally add web URLs.

After indexing, queries are answered using the combined knowledge of the local model and the uploaded data.

AnythingLLM API Example (Node.js)

import axios from 'axios';

class AnythingLLMClient {
  constructor(config) {
    this.baseURL = config.baseURL || 'http://localhost:3001';
    this.apiToken = config.apiToken;
    this.workspaceId = config.workspaceId;
    this.threadId = config.threadId;
    this.userId = config.userId || 1;
    this.axiosInstance = axios.create({
      baseURL: this.baseURL,
      headers: {
        accept: 'application/json',
        Authorization: `Bearer ${this.apiToken}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async chat(message) {
    try {
      const url = `/api/v1/workspace/${this.workspaceId}/thread/${this.threadId}/chat`;
      const response = await this.axiosInstance.post(url, {
        message: message,
        mode: 'chat',
        userId: this.userId
      });
      return response.data;
    } catch (error) {
      console.error('Request error:', error.message);
    }
  }
}

async function main() {
  try {
    const client = new AnythingLLMClient({
      baseURL: 'http://127.0.0.1:3001',
      apiToken: 'Z6CZPBR-8EP4DFB-PNF9MA0-E647JN9',
      workspaceId: '13cd9433-918f-4425-b03f-63140fb5c2d6',
      threadId: 'df4615cc-5873-4dcf-b8ea-ca777355e71a',
      userId: 1
    });
    console.log('Sending query...');
    const response = await client.chat('戴森V10 Fluffy手持无线吸尘器的价格');
    console.log('
Response:');
    console.log(JSON.stringify(response, null, 2));
  } catch (error) {
    console.error('Error:', error.message);
  }
}

main();

The API returns the model’s answer, retrieved documents, token usage, and performance metrics, enabling programmatic integration of the local AI system.

Conclusion

By combining Ollama, Chatbox, Page Assist, and AnythingLLM, you can create a fully offline, privacy‑preserving AI assistant with a custom knowledge base, avoiding cloud costs, latency, and data leakage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGDeepSeekKnowledge BaseAI deploymentOllamaAnythingLLMLocal-LLM
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.