Build a Fast, Zero‑Cost Local AI Knowledge Base with Ollama, Cherry Studio, and Qwen‑3

This guide walks you through building a high‑performance local AI knowledge base using Ollama, Cherry Studio, and the Qwen‑3 model, covering RAG fundamentals, model selection, document preparation, system configuration, and step‑by‑step UI operations for non‑programmers.

Eric Tech Circle
Eric Tech Circle
Eric Tech Circle
Build a Fast, Zero‑Cost Local AI Knowledge Base with Ollama, Cherry Studio, and Qwen‑3

Background and Motivation

Previous articles on local AI knowledge bases highlighted issues with early small‑parameter models, slow response times, and unsatisfactory embedding quality. To address these problems, this tutorial presents a new solution that combines Ollama, Cherry Studio, the Qwen‑3 model, and MCP tools, targeting users without programming experience.

RAG Overview

RAG (Retrieval‑Augmented Generation) first retrieves relevant content from an external knowledge base and then lets a large language model generate answers, improving accuracy and contextual relevance.

The data flow of RAG is illustrated below:

RAG data flow diagram
RAG data flow diagram

Core Components of a Local Knowledge Base

High‑quality documentation : Structured, rich documents form the foundation of reliable information.

Embedding model : Converts text chunks into low‑dimensional vectors stored in a vector database for semantic similarity search.

Reranker model : Refines the initial similarity results with precise scoring to improve retrieval accuracy.

Large language model : Acts as the “brain,” interpreting retrieved content and generating coherent answers.

Model Selection

For personal, lightweight knowledge bases, the qwen3-8b model offers strong performance despite its modest size. Reddit tests show it scores well in RAG scenarios.

Qwen‑3 performance chart
Qwen‑3 performance chart

After installing Qwen‑3 and its embedding model via Ollama, you can list installed models in the terminal.

Ollama model list
Ollama model list

Note for non‑programmers: If you are unfamiliar with terminal commands, you can skip this step and use the simpler method described later.

Preparing Knowledge‑Base Documents

High‑quality documents are essential. In addition to manual curation, you can use AI to crawl web content and store it as Markdown, which is AI‑friendly.

Example prompt template (replace the URL with the target page):

Use the sequential‑thinking tool to: Fetch the webpage https://docs.cursor.com/context/model-context-protocol and retrieve the content of each link. Save each page’s content as an individual Markdown file in a docs folder using the filesystem tool.

Detailed MCP configuration is covered in a separate video; it is omitted here.

MCP configuration screenshot
MCP configuration screenshot

Implementation Steps

01 Add Models

In both Ollama and the SiliconFlow platform, add the inference model, embedding model, and reranker model.

Add model UI
Add model UI
Model configuration UI
Model configuration UI

The SiliconFlow platform provides free access to qwen3-8b and bge models, offering faster token throughput (~50 tokens/s) compared to Ollama (~10 tokens/s).

SiliconFlow model list
SiliconFlow model list

02 Create Knowledge Base

Enter a name, select the embedding and reranker models, and then import documents. The application supports bulk directory import, automatically embedding and storing files.

Create knowledge base UI
Create knowledge base UI

03 Create Agent

Provide an agent name, set a prompt, and link the previously created knowledge base. Example core prompt (translated to English):

You are an expert assistant familiar with the Cursor AI editor, proficient at answering questions related to Cursor usage, configuration, plugins, and coding.

When a user asks a question about Cursor, first search the associated knowledge base for relevant information.

If the knowledge base does not contain an answer, then provide a response based on your expertise.

If you enable network‑capable MCP tools, the agent can fall back to online queries when the local knowledge base lacks information, reducing hallucinations.

Agent configuration UI
Agent configuration UI

Set the model temperature to 0 to force factual answers.

Temperature setting
Temperature setting

04 Test in Chat Window

Add the agent to the chat window and start asking questions.

Chat window
Chat window

You can now query the knowledge base freely.

Chat query example
Chat query example
Chat response example
Chat response example

You can also switch to the local Ollama model to compare results.

Model switch UI
Model switch UI

Use Cases

Personal knowledge management – consolidate notes and documents for quick retrieval.

Professional learning – import books and papers for deep Q&A.

Work efficiency – rapidly locate project docs, meeting minutes, etc.

Content creation – generate writing material or ideas from existing knowledge.

Advantages Over Cloud AI Services

Data security : Sensitive information never leaves the local machine.

Offline capability : Works without an internet connection.

No usage limits : Unlimited queries and token generation.

Full control : System configuration can be freely adjusted.

Conclusion

The presented method yields a functional, responsive local AI knowledge base suitable for personal knowledge management, professional study, and productivity enhancement. Future improvements include larger models, refined chunking strategies, and better prompt engineering as open‑source models evolve.

AIRAGKnowledge BaseOllamalocal LLM
Eric Tech Circle
Written by

Eric Tech Circle

Backend team lead & architect with 10+ years experience, full‑stack engineer, sharing insights and solo development practice.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.