Build a Private Knowledge Base from Scratch with DeepSeek V4 and AnythingLLM
This guide walks you through creating a fully local, zero‑cloud RAG knowledge base using DeepSeek V4, AnythingLLM, and the BGE‑M3 embedding model, covering component choices, step‑by‑step installation, advanced tuning, troubleshooting, use‑case scenarios, and cost estimation.
Why a RAG Knowledge Base Is Needed
Documents often reside on local drives and cannot be queried directly. Retrieval‑Augmented Generation (RAG) enables an AI model to read those documents and answer natural‑language questions.
Recommended Component Stack (2026)
Large Model : DeepSeek V4 (run via Ollama locally) or DeepSeek API – provides question understanding and answer generation.
Frontend : AnythingLLM – handles document management, RAG pipeline, and chat UI.
Vector Store : LanceDB (built‑in) – stores vector indexes of embedded documents.
Embedding Model : BGE‑M3 (via Ollama) – converts text to dense vectors for semantic search.
Tool Selection Rationale
DeepSeek V4
Released April 2026 in two variants:
V4‑Pro : 1.6 T parameters, 49 B activation, 1 M token context window.
V4‑Flash : 284 B parameters, 13 B activation, faster and lighter.
Key advantages for knowledge‑base use:
Ultra‑long context (1 M tokens) allows feeding whole chapters, reducing reliance on small retrieved fragments.
Strong comprehension – ranks first on open‑source Agent benchmarks, handling vague queries accurately.
AnythingLLM
Full‑stack open‑source RAG system (≈50 K GitHub stars) with the following strengths:
Graphical UI, no coding required.
Supports multiple LLM providers (Ollama, OpenAI‑compatible, Claude, local APIs).
Handles PDF, Word, Excel, TXT, Markdown, source code, URLs, and YouTube subtitles.
Workspace isolation for separate projects or departments.
Enterprise‑grade features: multi‑user, permission management, API endpoints, Agent tools.
All data stays on the local machine.
30‑Minute End‑to‑End Setup
Step 1: Install Ollama (≈10 min)
Download from https://ollama.com and run:
# Pull DeepSeek V4‑Flash (recommended for local use)
ollama pull deepseek-v4:flash
# Pull the Chinese‑optimized embedding model BGE‑M3
ollama pull bge-m3
# Verify installation
ollama listHardware hints: V4‑Flash (13 B) needs ≥16 GB RAM; GPU acceleration improves inference speed. The API mode only requires BGE‑M3 (~1.1 GB RAM).
Step 2: Install AnythingLLM (≈5 min)
Download the desktop client for macOS/Windows/Linux from https://anythingllm.com/desktop, install, and launch.
Step 3: Configure the Large Model (≈5 min)
In AnythingLLM settings → LLM Configuration, choose one of:
Option A – Local Ollama (offline)
LLM Provider: Ollama
Base URL: http://localhost:11434
Model Name: deepseek-v4:flashOption B – DeepSeek API (faster)
LLM Provider: OpenAI Compatible
Base URL: https://api.deepseek.com
API Key: YOUR_API_KEY
Model Name: deepseek-v4-flashStep 4: Configure the Embedding Model (critical)
Set the embedding provider to Ollama and model to BGE‑M3:
Embedding Provider: Ollama
Base URL: http://localhost:11434
Embedding Model: bge-m3Why BGE‑M3? It scores 63.0 on the MTEB benchmark (2nd in Chinese), uses ~1.1 GB memory, and supports dense + sparse + multi‑vector retrieval, which improves keyword precision in technical documents.
Step 5: Create Workspaces (≈2 min)
Create separate workspaces for each project or purpose (e.g., Company Policies, Technical Docs, Industry Reports, Personal Notes). Each workspace stores its own vector index.
Step 6: Upload Documents (≈3 min)
Supported formats:
Documents: PDF, Word (.docx), TXT, Markdown
Data: CSV, Excel (.xlsx)
Code: Python, JavaScript, TypeScript, etc.
Web: Paste a URL to fetch full text
Video: YouTube links (auto‑extract subtitles)
After upload, click “Move to Workspace” then “Save and Embed” to vectorize the files.
Step 7: Test the Chat Interface
Ask natural‑language questions such as:
“How many days of annual leave does the company grant?”
“What was the YoY revenue growth in Q3 last year?”
“Where is the authentication module API spec?”
The AI retrieves relevant passages, generates answers, and shows source references.
Advanced Tuning – Boost Accuracy by 30 %+
1. Chunking Strategy
Adjust chunk size and overlap in AnythingLLM’s advanced settings. Recommended values:
General documents (policies, reports) : Chunk 1000, Overlap 200, split by sentences.
Technical docs (API manuals) : Chunk 1500, Overlap 300, split by Markdown headings.
Code repositories : Chunk 2000, Overlap 500, split at function or class boundaries.
Too small chunks break context; too large chunks dilute relevance.
2. Hybrid Search
Enable hybrid (semantic + keyword) search with the following JSON snippet:
{
"searchMode": "hybrid",
"semanticWeight": 0.7,
"keywordWeight": 0.3
}This combines meaning‑based retrieval with exact keyword matching, essential for product codes or internal identifiers.
3. Reranker
After an initial top‑20 retrieval, apply a cross‑encoder reranker to select the top‑5 most relevant chunks before passing them to the LLM:
{
"reranker": "cross-encoder",
"rerankerTopN": 5,
"initialRetrieve": 20
}Empirical tests on professional documents show a 20‑30 % increase in answer accuracy.
Common Pitfalls & Solutions
Problem 1: “No relevant information found”
Verify the embedding model is set to BGE‑M3 and points to the correct Ollama port.
Ensure all documents show status “Embedded”.
Check that chunk size is not too small.
Make the query more specific (e.g., replace “find the rule” with “find the travel‑expense policy”).
Problem 2: Poor table extraction from PDFs
Convert PDF to Word before uploading to preserve tables.
Or extract tables to CSV and upload separately.
Scanned image PDFs require OCR preprocessing.
Problem 3: Slow Ollama model response
Switch to DeepSeek API for faster inference.
Reduce the num_ctx setting if it was increased, as a larger context window slows processing.
Enable GPU acceleration (verify with ollama ps).
Problem 4: Poor Chinese document retrieval
Use BGE‑M3 for embeddings; alternatives like nomic-embed-text degrade Chinese performance by >20 %.
If documents were previously embedded with a weaker model, delete them, switch the model, and re‑embed.
Real‑World Scenarios
Scenario A – Enterprise Compliance Knowledge Base
Upload all policy PDFs, Word files, and FAQs.
Name workspace Compliance‑2026.
Configure Chunk 1000 and Hybrid Search.
Result: Employees can ask “What content may be published externally?” and receive precise policy excerpts.
Scenario B – R&D Technical Documentation
Upload API specs, architecture docs, meeting minutes, and code comments (Markdown).
Name workspace R&D‑Knowledge‑PROJECT.
Configure Chunk 1500, Markdown heading split, and Reranker.
Result: Query “What is the timeout setting for the order‑payment API?” returns the exact document segment.
Scenario C – Industry Research Library
Upload market‑research PDFs and personal notes.
Name workspace Industry‑Research‑AI‑Track.
Configure Chunk 1500 and Hybrid Search.
Result: Ask “What was global AI investment in 2025?” and get a sourced answer compiled from multiple reports.
Impact of DeepSeek V4’s Long Context
Traditional RAG retrieves 3‑5 chunks (Top‑K). With a 1 M‑token window, Top‑K can be raised to 20‑50, allowing the model to see larger portions of a document at once. This improves:
Cross‑chapter comparative questions.
Long‑form summarization of reports.
Multi‑document contrast queries.
Cost Estimation
Pure Local (Ollama + BGE‑M3) : ~0 CNY/month (electricity ignored).
DeepSeek API + Local Embedding : 10‑30 CNY/month.
Heavy DeepSeek API usage : 50‑200 CNY/month for multi‑user, high‑concurrency teams.
For a 5‑person team with 20 queries per person per day (≈5 000 tokens each), monthly API cost is roughly 30‑80 CNY, far cheaper than enterprise SaaS knowledge‑base services costing tens of thousands of yuan per year.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Lao Guo's Learning Space
AI learning, discussion, and hands‑on practice with self‑reflection
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
