How RAG Gives Large Language Models Their Own Knowledge Base – Illustrated with Easysearch
The article explains why Retrieval‑Augmented Generation (RAG) is needed to overcome large language models' knowledge cut‑off and hallucination issues, details the offline indexing and online retrieval‑generation workflow, compares RAG with fine‑tuning, and shows how Easysearch’s hybrid search makes an effective RAG backbone.
Large language models such as ChatGPT and Claude are powerful but suffer from two critical weaknesses in production: their knowledge is frozen at the training cut‑off date, and they hallucinate when faced with unknown queries. This makes them unreliable for enterprise‑specific code or the latest features of Elasticsearch 9.x and Easysearch 2.x.
Why RAG?
RAG (Retrieval‑Augmented Generation) addresses these pain points by turning a "closed‑book" model into an "open‑book" one. Instead of relying solely on memorized parameters, the model consults a real‑time external knowledge base before answering.
Key differences between traditional LLMs (closed‑book) and RAG (open‑book) include knowledge source (trained parameters vs. live external index), timeliness (fixed vs. up‑to‑date), private data access, and hallucination risk (high vs. greatly reduced).
What is RAG?
RAG combines two steps: offline knowledge base construction and online retrieval‑generation.
How RAG Works – Detailed Process
1. Offline Phase – Building the Knowledge Base
Chunking : Split long documents into semantically complete chunks, typically 500 characters with a 50‑character overlap to preserve context.
Embedding : Use an embedding model (e.g., BAAI/bge-small-zh-v1.5) to convert each chunk into a high‑dimensional vector.
Store in Vector‑Enabled DB : Write the vectors into a database that supports vector search, such as Easysearch.
2. Online Phase – Retrieval
When a user asks a question, the system converts the query into a vector (a "magnet") and retrieves the nearest knowledge chunks based on semantic similarity rather than exact keyword match.
3. Online Phase – Generation
The retrieved chunks are combined with the original user query into a prompt and fed to the LLM, instructing it to answer using the provided references.
RAG vs. Fine‑Tuning
Fine‑tuning changes the model's internal weights to adapt to specific data, which is time‑consuming and costly. RAG, by contrast, updates knowledge simply by refreshing the external index.
Knowledge Update : Real‑time with RAG; requires retraining with fine‑tuning.
Cost : Low for RAG (only embedding compute); high for fine‑tuning (GPU‑intensive).
Explainability : High for RAG (answers traceable to sources); low for fine‑tuning (black‑box).
Suitable Scenarios : Knowledge‑intensive, frequently updated corpora for RAG; fixed‑format, domain‑specific tasks for fine‑tuning.
Why Easysearch Is an Ideal RAG Backbone
Easysearch (and Elasticsearch) serve as the "library manager" in the RAG pipeline, offering hybrid search that combines precise BM25 keyword matching with semantic vector search. This mitigates the weakness of pure vector search (lower precision) while retaining the flexibility of semantic retrieval.
Hybrid search in Easysearch is configured with a concise JSON request that blends knn_nearest_neighbors (vector) and match (BM25) clauses, eliminating the need to maintain separate full‑text and vector databases.
POST /my-index/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{ "knn_nearest_neighbors": { "field": "embedding", "vec": { "values": [0.12, -0.03, ...] }, "model": "lsh", "similarity": "cosine", "candidates": 100 } }
],
"should": [
{ "match": { "content": { "query": "分布式搜索引擎", "boost": 2.0 } } }
]
}
}
}This unified approach drastically reduces operational overhead compared to managing separate full‑text and vector search stacks.
Decision Guidance
For most enterprise Q&A scenarios, the recommendation is to adopt RAG directly. In highly specialized verticals (e.g., medical diagnosis), a hybrid of RAG plus fine‑tuning may be considered.
Conclusion
RAG enhances LLMs without altering their core intelligence, adding an "eye" that can see private enterprise data, thereby reducing hallucinations and making AI deployment viable in production. Search engines are not obsolete; they have evolved into the foundational infrastructure that powers modern AI applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
