Build a Fast, Zero‑Cost Local AI Knowledge Base with Ollama, Cherry Studio, and Qwen‑3
This guide walks you through building a high‑performance local AI knowledge base using Ollama, Cherry Studio, and the Qwen‑3 model, covering RAG fundamentals, model selection, document preparation, system configuration, and step‑by‑step UI operations for non‑programmers.
Background and Motivation
Previous articles on local AI knowledge bases highlighted issues with early small‑parameter models, slow response times, and unsatisfactory embedding quality. To address these problems, this tutorial presents a new solution that combines Ollama, Cherry Studio, the Qwen‑3 model, and MCP tools, targeting users without programming experience.
RAG Overview
RAG (Retrieval‑Augmented Generation) first retrieves relevant content from an external knowledge base and then lets a large language model generate answers, improving accuracy and contextual relevance.
The data flow of RAG is illustrated below:
Core Components of a Local Knowledge Base
High‑quality documentation : Structured, rich documents form the foundation of reliable information.
Embedding model : Converts text chunks into low‑dimensional vectors stored in a vector database for semantic similarity search.
Reranker model : Refines the initial similarity results with precise scoring to improve retrieval accuracy.
Large language model : Acts as the “brain,” interpreting retrieved content and generating coherent answers.
Model Selection
For personal, lightweight knowledge bases, the qwen3-8b model offers strong performance despite its modest size. Reddit tests show it scores well in RAG scenarios.
After installing Qwen‑3 and its embedding model via Ollama, you can list installed models in the terminal.
Note for non‑programmers: If you are unfamiliar with terminal commands, you can skip this step and use the simpler method described later.
Preparing Knowledge‑Base Documents
High‑quality documents are essential. In addition to manual curation, you can use AI to crawl web content and store it as Markdown, which is AI‑friendly.
Example prompt template (replace the URL with the target page):
Use the sequential‑thinking tool to: Fetch the webpage https://docs.cursor.com/context/model-context-protocol and retrieve the content of each link. Save each page’s content as an individual Markdown file in a docs folder using the filesystem tool.
Detailed MCP configuration is covered in a separate video; it is omitted here.
Implementation Steps
01 Add Models
In both Ollama and the SiliconFlow platform, add the inference model, embedding model, and reranker model.
The SiliconFlow platform provides free access to qwen3-8b and bge models, offering faster token throughput (~50 tokens/s) compared to Ollama (~10 tokens/s).
02 Create Knowledge Base
Enter a name, select the embedding and reranker models, and then import documents. The application supports bulk directory import, automatically embedding and storing files.
03 Create Agent
Provide an agent name, set a prompt, and link the previously created knowledge base. Example core prompt (translated to English):
You are an expert assistant familiar with the Cursor AI editor, proficient at answering questions related to Cursor usage, configuration, plugins, and coding.
When a user asks a question about Cursor, first search the associated knowledge base for relevant information.
If the knowledge base does not contain an answer, then provide a response based on your expertise.If you enable network‑capable MCP tools, the agent can fall back to online queries when the local knowledge base lacks information, reducing hallucinations.
Set the model temperature to 0 to force factual answers.
04 Test in Chat Window
Add the agent to the chat window and start asking questions.
You can now query the knowledge base freely.
You can also switch to the local Ollama model to compare results.
Use Cases
Personal knowledge management – consolidate notes and documents for quick retrieval.
Professional learning – import books and papers for deep Q&A.
Work efficiency – rapidly locate project docs, meeting minutes, etc.
Content creation – generate writing material or ideas from existing knowledge.
Advantages Over Cloud AI Services
Data security : Sensitive information never leaves the local machine.
Offline capability : Works without an internet connection.
No usage limits : Unlimited queries and token generation.
Full control : System configuration can be freely adjusted.
Conclusion
The presented method yields a functional, responsive local AI knowledge base suitable for personal knowledge management, professional study, and productivity enhancement. Future improvements include larger models, refined chunking strategies, and better prompt engineering as open‑source models evolve.
Eric Tech Circle
Backend team lead & architect with 10+ years experience, full‑stack engineer, sharing insights and solo development practice.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
