Build a Private AI Knowledge Assistant with n8n: Zero‑Code RAG in 30 Minutes

This guide shows how to create a fully local Retrieval‑Augmented Generation (RAG) system using n8n, Docker, Ollama and the free Qwen3 embedding model, enabling secure, up‑to‑date AI assistants that answer enterprise questions without exposing any proprietary data.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Build a Private AI Knowledge Assistant with n8n: Zero‑Code RAG in 30 Minutes

Many large language models (LLMs) cannot answer company‑specific questions because they lack access to internal documents and risk exposing sensitive data when connected to the cloud. A Retrieval‑Augmented Generation (RAG) pipeline solves this by first searching a private knowledge base and then feeding the retrieved context to the LLM.

RAG’s three‑step workflow

Question vectorisation : the query is converted into a high‑dimensional embedding.

Intelligent retrieval : the embedding is used to find the most similar document chunks in a vector database.

Fusion generation : the retrieved text and the original question are passed to the LLM, which generates a grounded answer.

Environment preparation (≈5 minutes)

Deploy n8n with Docker:

docker volume create n8n_data

docker run -it --rm \
    --name n8n \
    -p 5678:5678 \
    -e GENERIC_TIMEZONE=Asia/Shanghai \
    -e N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true \
    -v n8n_data:/home/node/.n8n \
    docker.n8n.io/n8nio/n8n:1.94.1

After a few seconds, open http://localhost:5678 to see the n8n UI.

Qwen3 embedding model

Download the free Qwen3‑Embedding‑4B model (95 % retrieval accuracy on MTEB) from HuggingFace:

https://huggingface.co/Qwen/Qwen3-Embedding-4B-GGUF

Ollama integration (model serving)

Create a model file pointing to the downloaded GGUF and register it with Ollama:

# Create model file
 echo "FROM /path/to/Qwen3-Embedding-4B-Q4_K_M.gguf" > Modelfile
# Build the model
 ollama create qwen3-embedding -f Modelfile
# Verify
 ollama list

Test the embedding endpoint:

curl http://localhost:11434/api/embed -d '{
    "model": "qwen3-embedding",
    "input": "test text embedding"
}'

Document processing workflow (n8n)

1️⃣ Form Submission – users upload files. 2️⃣ Document Loader – automatically parses PDF, Word, TXT, etc. 3️⃣ Splitters – use the Recursive Text Splitter (chunk size 1000‑2000 characters, overlap 200) to keep semantic integrity. 4️⃣ Embedding – send chunks to the Qwen3 embedding model via Ollama. 5️⃣ Simple Vector Store – store vectors in memory (for demo; production should use a persistent vector DB).

The data flow can be summarised as:

Form Submission → Document Loader → Splitters → Embedding → Vector Store

Intelligent Q&A workflow

1️⃣ Chat Trigger – receives user questions. 2️⃣ AI Agent – decides whether to query the knowledge base. 3️⃣ Vector Store tool – retrieves relevant chunks. 4️⃣ Chat Model – combines retrieved context with the LLM (e.g., a local Ollama model) to generate the answer. Prompt used for the assistant: <code>You are a company knowledge assistant that answers questions based on internal documents. Core principles: 1. Prefer retrieved document content. 2. If no relevant info, state that clearly and suggest contacting the relevant department. 3. Provide accurate, concise answers with citations. 4. Include source references when possible. Answer format: - Core answer - Detailed explanation - Source citation - Suggested next steps</code> Deploying the chatbot to a website Enable the public chat endpoint in n8n and embed the following snippet on any site (Hugo, WordPress, React, plain HTML): <code>&lt;link href="https://cdn.jsdelivr.net/npm/@n8n/chat/dist/style.css" rel="stylesheet" /&gt; &lt;script type="module"&gt; import { createChat } from 'https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.bundle.es.js'; createChat({ webhookUrl: 'YOUR_PRODUCTION_WEBHOOK_URL' }); &lt;/script&gt;</code> After adding the code to the page footer, an always‑available AI chat widget appears in the bottom‑right corner. Conclusion The tutorial demonstrates a complete, locally hosted RAG system that protects data privacy, stays up‑to‑date, and can be extended from a simple document Q&A bot to more sophisticated enterprise assistants. References RAG: https://cloud.google.com/use-cases/retrieval-augmented-generation n8n chat: https://www.npmjs.com/package/@n8n/chat Hugo: https://github.com/gohugoio/hugo Qwen3 Embedding 4B + n8n demo: https://www.youtube.com/watch?v=abv1DsIhNfA

DockerRAGEmbeddingAI assistantOllamaVector Storen8n
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.