Build a Private AI Knowledge Assistant with n8n: Zero‑Code RAG in 30 Minutes
This guide shows how to create a fully local Retrieval‑Augmented Generation (RAG) system using n8n, Docker, Ollama and the free Qwen3 embedding model, enabling secure, up‑to‑date AI assistants that answer enterprise questions without exposing any proprietary data.
Many large language models (LLMs) cannot answer company‑specific questions because they lack access to internal documents and risk exposing sensitive data when connected to the cloud. A Retrieval‑Augmented Generation (RAG) pipeline solves this by first searching a private knowledge base and then feeding the retrieved context to the LLM.
RAG’s three‑step workflow
Question vectorisation : the query is converted into a high‑dimensional embedding.
Intelligent retrieval : the embedding is used to find the most similar document chunks in a vector database.
Fusion generation : the retrieved text and the original question are passed to the LLM, which generates a grounded answer.
Environment preparation (≈5 minutes)
Deploy n8n with Docker:
docker volume create n8n_data
docker run -it --rm \
--name n8n \
-p 5678:5678 \
-e GENERIC_TIMEZONE=Asia/Shanghai \
-e N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true \
-v n8n_data:/home/node/.n8n \
docker.n8n.io/n8nio/n8n:1.94.1After a few seconds, open http://localhost:5678 to see the n8n UI.
Qwen3 embedding model
Download the free Qwen3‑Embedding‑4B model (95 % retrieval accuracy on MTEB) from HuggingFace:
https://huggingface.co/Qwen/Qwen3-Embedding-4B-GGUF
Ollama integration (model serving)
Create a model file pointing to the downloaded GGUF and register it with Ollama:
# Create model file
echo "FROM /path/to/Qwen3-Embedding-4B-Q4_K_M.gguf" > Modelfile
# Build the model
ollama create qwen3-embedding -f Modelfile
# Verify
ollama listTest the embedding endpoint:
curl http://localhost:11434/api/embed -d '{
"model": "qwen3-embedding",
"input": "test text embedding"
}'Document processing workflow (n8n)
1️⃣ Form Submission – users upload files. 2️⃣ Document Loader – automatically parses PDF, Word, TXT, etc. 3️⃣ Splitters – use the Recursive Text Splitter (chunk size 1000‑2000 characters, overlap 200) to keep semantic integrity. 4️⃣ Embedding – send chunks to the Qwen3 embedding model via Ollama. 5️⃣ Simple Vector Store – store vectors in memory (for demo; production should use a persistent vector DB).
The data flow can be summarised as:
Form Submission → Document Loader → Splitters → Embedding → Vector StoreIntelligent Q&A workflow
1️⃣ Chat Trigger – receives user questions. 2️⃣ AI Agent – decides whether to query the knowledge base. 3️⃣ Vector Store tool – retrieves relevant chunks. 4️⃣ Chat Model – combines retrieved context with the LLM (e.g., a local Ollama model) to generate the answer. Prompt used for the assistant: <code>You are a company knowledge assistant that answers questions based on internal documents. Core principles: 1. Prefer retrieved document content. 2. If no relevant info, state that clearly and suggest contacting the relevant department. 3. Provide accurate, concise answers with citations. 4. Include source references when possible. Answer format: - Core answer - Detailed explanation - Source citation - Suggested next steps</code> Deploying the chatbot to a website Enable the public chat endpoint in n8n and embed the following snippet on any site (Hugo, WordPress, React, plain HTML): <code><link href="https://cdn.jsdelivr.net/npm/@n8n/chat/dist/style.css" rel="stylesheet" /> <script type="module"> import { createChat } from 'https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.bundle.es.js'; createChat({ webhookUrl: 'YOUR_PRODUCTION_WEBHOOK_URL' }); </script></code> After adding the code to the page footer, an always‑available AI chat widget appears in the bottom‑right corner. Conclusion The tutorial demonstrates a complete, locally hosted RAG system that protects data privacy, stays up‑to‑date, and can be extended from a simple document Q&A bot to more sophisticated enterprise assistants. References RAG: https://cloud.google.com/use-cases/retrieval-augmented-generation n8n chat: https://www.npmjs.com/package/@n8n/chat Hugo: https://github.com/gohugoio/hugo Qwen3 Embedding 4B + n8n demo: https://www.youtube.com/watch?v=abv1DsIhNfA
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
