Build a Private Knowledge Base from Scratch with DeepSeek V4 and AnythingLLM

This guide walks you through creating a fully local, zero‑cloud RAG knowledge base using DeepSeek V4, AnythingLLM, and the BGE‑M3 embedding model, covering component choices, step‑by‑step installation, advanced tuning, troubleshooting, use‑case scenarios, and cost estimation.

Lao Guo's Learning Space
Lao Guo's Learning Space
Lao Guo's Learning Space
Build a Private Knowledge Base from Scratch with DeepSeek V4 and AnythingLLM

Why a RAG Knowledge Base Is Needed

Documents often reside on local drives and cannot be queried directly. Retrieval‑Augmented Generation (RAG) enables an AI model to read those documents and answer natural‑language questions.

Recommended Component Stack (2026)

Large Model : DeepSeek V4 (run via Ollama locally) or DeepSeek API – provides question understanding and answer generation.

Frontend : AnythingLLM – handles document management, RAG pipeline, and chat UI.

Vector Store : LanceDB (built‑in) – stores vector indexes of embedded documents.

Embedding Model : BGE‑M3 (via Ollama) – converts text to dense vectors for semantic search.

Tool Selection Rationale

DeepSeek V4

Released April 2026 in two variants:

V4‑Pro : 1.6 T parameters, 49 B activation, 1 M token context window.

V4‑Flash : 284 B parameters, 13 B activation, faster and lighter.

Key advantages for knowledge‑base use:

Ultra‑long context (1 M tokens) allows feeding whole chapters, reducing reliance on small retrieved fragments.

Strong comprehension – ranks first on open‑source Agent benchmarks, handling vague queries accurately.

AnythingLLM

Full‑stack open‑source RAG system (≈50 K GitHub stars) with the following strengths:

Graphical UI, no coding required.

Supports multiple LLM providers (Ollama, OpenAI‑compatible, Claude, local APIs).

Handles PDF, Word, Excel, TXT, Markdown, source code, URLs, and YouTube subtitles.

Workspace isolation for separate projects or departments.

Enterprise‑grade features: multi‑user, permission management, API endpoints, Agent tools.

All data stays on the local machine.

30‑Minute End‑to‑End Setup

Step 1: Install Ollama (≈10 min)

Download from https://ollama.com and run:

# Pull DeepSeek V4‑Flash (recommended for local use)
ollama pull deepseek-v4:flash
# Pull the Chinese‑optimized embedding model BGE‑M3
ollama pull bge-m3
# Verify installation
ollama list

Hardware hints: V4‑Flash (13 B) needs ≥16 GB RAM; GPU acceleration improves inference speed. The API mode only requires BGE‑M3 (~1.1 GB RAM).

Step 2: Install AnythingLLM (≈5 min)

Download the desktop client for macOS/Windows/Linux from https://anythingllm.com/desktop, install, and launch.

Step 3: Configure the Large Model (≈5 min)

In AnythingLLM settings → LLM Configuration, choose one of:

Option A – Local Ollama (offline)

LLM Provider: Ollama
Base URL: http://localhost:11434
Model Name: deepseek-v4:flash

Option B – DeepSeek API (faster)

LLM Provider: OpenAI Compatible
Base URL: https://api.deepseek.com
API Key: YOUR_API_KEY
Model Name: deepseek-v4-flash

Step 4: Configure the Embedding Model (critical)

Set the embedding provider to Ollama and model to BGE‑M3:

Embedding Provider: Ollama
Base URL: http://localhost:11434
Embedding Model: bge-m3

Why BGE‑M3? It scores 63.0 on the MTEB benchmark (2nd in Chinese), uses ~1.1 GB memory, and supports dense + sparse + multi‑vector retrieval, which improves keyword precision in technical documents.

Step 5: Create Workspaces (≈2 min)

Create separate workspaces for each project or purpose (e.g., Company Policies, Technical Docs, Industry Reports, Personal Notes). Each workspace stores its own vector index.

Step 6: Upload Documents (≈3 min)

Supported formats:

Documents: PDF, Word (.docx), TXT, Markdown

Data: CSV, Excel (.xlsx)

Code: Python, JavaScript, TypeScript, etc.

Web: Paste a URL to fetch full text

Video: YouTube links (auto‑extract subtitles)

After upload, click “Move to Workspace” then “Save and Embed” to vectorize the files.

Step 7: Test the Chat Interface

Ask natural‑language questions such as:

“How many days of annual leave does the company grant?”

“What was the YoY revenue growth in Q3 last year?”

“Where is the authentication module API spec?”

The AI retrieves relevant passages, generates answers, and shows source references.

Advanced Tuning – Boost Accuracy by 30 %+

1. Chunking Strategy

Adjust chunk size and overlap in AnythingLLM’s advanced settings. Recommended values:

General documents (policies, reports) : Chunk 1000, Overlap 200, split by sentences.

Technical docs (API manuals) : Chunk 1500, Overlap 300, split by Markdown headings.

Code repositories : Chunk 2000, Overlap 500, split at function or class boundaries.

Too small chunks break context; too large chunks dilute relevance.

2. Hybrid Search

Enable hybrid (semantic + keyword) search with the following JSON snippet:

{
  "searchMode": "hybrid",
  "semanticWeight": 0.7,
  "keywordWeight": 0.3
}

This combines meaning‑based retrieval with exact keyword matching, essential for product codes or internal identifiers.

3. Reranker

After an initial top‑20 retrieval, apply a cross‑encoder reranker to select the top‑5 most relevant chunks before passing them to the LLM:

{
  "reranker": "cross-encoder",
  "rerankerTopN": 5,
  "initialRetrieve": 20
}

Empirical tests on professional documents show a 20‑30 % increase in answer accuracy.

Common Pitfalls & Solutions

Problem 1: “No relevant information found”

Verify the embedding model is set to BGE‑M3 and points to the correct Ollama port.

Ensure all documents show status “Embedded”.

Check that chunk size is not too small.

Make the query more specific (e.g., replace “find the rule” with “find the travel‑expense policy”).

Problem 2: Poor table extraction from PDFs

Convert PDF to Word before uploading to preserve tables.

Or extract tables to CSV and upload separately.

Scanned image PDFs require OCR preprocessing.

Problem 3: Slow Ollama model response

Switch to DeepSeek API for faster inference.

Reduce the num_ctx setting if it was increased, as a larger context window slows processing.

Enable GPU acceleration (verify with ollama ps).

Problem 4: Poor Chinese document retrieval

Use BGE‑M3 for embeddings; alternatives like nomic-embed-text degrade Chinese performance by >20 %.

If documents were previously embedded with a weaker model, delete them, switch the model, and re‑embed.

Real‑World Scenarios

Scenario A – Enterprise Compliance Knowledge Base

Upload all policy PDFs, Word files, and FAQs.

Name workspace Compliance‑2026.

Configure Chunk 1000 and Hybrid Search.

Result: Employees can ask “What content may be published externally?” and receive precise policy excerpts.

Scenario B – R&D Technical Documentation

Upload API specs, architecture docs, meeting minutes, and code comments (Markdown).

Name workspace R&D‑Knowledge‑PROJECT.

Configure Chunk 1500, Markdown heading split, and Reranker.

Result: Query “What is the timeout setting for the order‑payment API?” returns the exact document segment.

Scenario C – Industry Research Library

Upload market‑research PDFs and personal notes.

Name workspace Industry‑Research‑AI‑Track.

Configure Chunk 1500 and Hybrid Search.

Result: Ask “What was global AI investment in 2025?” and get a sourced answer compiled from multiple reports.

Impact of DeepSeek V4’s Long Context

Traditional RAG retrieves 3‑5 chunks (Top‑K). With a 1 M‑token window, Top‑K can be raised to 20‑50, allowing the model to see larger portions of a document at once. This improves:

Cross‑chapter comparative questions.

Long‑form summarization of reports.

Multi‑document contrast queries.

Cost Estimation

Pure Local (Ollama + BGE‑M3) : ~0 CNY/month (electricity ignored).

DeepSeek API + Local Embedding : 10‑30 CNY/month.

Heavy DeepSeek API usage : 50‑200 CNY/month for multi‑user, high‑concurrency teams.

For a 5‑person team with 20 queries per person per day (≈5 000 tokens each), monthly API cost is roughly 30‑80 CNY, far cheaper than enterprise SaaS knowledge‑base services costing tens of thousands of yuan per year.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGVector DatabaseKnowledge BaseAnythingLLMLocal LLMDeepSeek V4BGE‑M3
Lao Guo's Learning Space
Written by

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.