What Is Retrieval‑Augmented Generation (RAG) and How Does It Boost AI Accuracy?
This article explains Retrieval‑Augmented Generation (RAG), an AI framework that combines external knowledge retrieval with large language models, covering its motivations, data preparation, chunking strategies, vectorization, storage, query processing, retrieval, reranking, prompt engineering, and LLM generation, plus practical optimization tips.
What Is RAG?
RAG (Retrieval‑Augmented Generation) is an AI framework that combines traditional information retrieval systems (e.g., databases) with generative large language models (LLMs). Instead of relying solely on the knowledge stored during LLM training, RAG first “looks up” external data sources and then generates answers based on that retrieved context, improving accuracy.
Key Problems Addressed by RAG
Knowledge freshness : Overcomes the time‑bound training data of large models.
Hallucination : Reduces the chance of fabricated answers by providing source references.
Information security : Uses external knowledge bases instead of internal training data, lowering privacy risks.
Vertical domain knowledge : Integrates specialized domain information without additional model training.
RAG Core Workflow
2.1 Knowledge Preparation Stage
2.1.1 Data Pre‑processing
Input : Raw documents (Markdown, PDF, HTML, etc.)
Operations :
Extract plain text (e.g., parse Markdown headings, paragraphs).
Handle special formats such as code blocks, tables, images, videos.
2.1.2 Metadata Extraction
Metadata describes the document (source URL, filename, creation time, author, type) and enriches retrieval quality.
Source : URL, file path, database record.
Creation time : Document creation or update timestamp.
Author : Document author or editor.
Document type : News article, academic paper, blog, etc.
2.1.3 Chunking (Content Segmentation)
Chunking splits long documents into smaller pieces to fit vector model token limits and improve retrieval precision. Common strategies:
Size‑based chunking : Fixed character count; simple but may cut semantic units.
Paragraph‑based chunking : Keeps whole paragraphs; preserves semantics but yields uneven chunk sizes.
Semantic chunking : Uses text similarity to create coherent segments; computationally expensive but maintains meaning.
Example of size‑based chunking:
第一段:# ROMA框架介绍ROMA是一个全自主研发的前端开发框架,基于自定义DSL(Jue语言)。一份代码,可在iOS、Android、Harmony、Web三端运行的跨平台解决方案。ROMA框架的中文名为罗码。Example of semantic chunking:
第一段:# ROMA框架介绍ROMA是一个全自主研发的前端开发框架,基于自定义DSL(Jue语言)。一份代码,可在iOS、Android、Harmony、Web四端运行的跨平台解决方案。2.1.4 Vectorization
Text is compressed into low‑dimensional vectors, enabling fast similarity calculations. Example vector representation:
{"chunk_id":"doc1_chunk1","text":"# 什么是 ROMA? ROMA 是一个全自主研发的前端开发框架…","vector":[0.041,-0.018,0.063,…],"metadata":{"source":"roma_introduction.md","title":"ROMA框架介绍"}}2.1.5 Storing Vectors in a Vector Database
Vectors and metadata are persisted in a vector DB with an index for efficient similarity search. Common vector databases include:
2.2 Question‑Answering Stage
2.2.1 Query Pre‑processing
Intent detection : Classify question type (fact, recommendation, chit‑chat).
Query cleaning & standardization : Same steps as document cleaning.
Query enhancement : Expand with synonyms or retrieve related context.
2.2.2 Retrieval (Recall)
Vector search : Compute cosine similarity between query vector and stored vectors.
Keyword search : Traditional inverted‑index matching.
Hybrid search : Combine both for best results.
Example query vector:
{"vector":[0.052,-0.021,0.075,…],"top_k":3,"score_threshold":0.8,"filter":{"doc_type":"技术文档"}}2.2.3 Reranking
Reranking refines initial results by applying a model that scores relevance, normalizes scores to [0,1], and merges with vector similarity.
2.2.4 Information Integration & Prompt Engineering
Format retrieved snippets, truncate or summarize to fit LLM context, and build a prompt template that specifies answer scope, source citation, and refusal rules.
prompt template:
You are a ROMA framework expert. Based on the following context, answer the question.
Reference info:
[Doc1] 什么是 ROMA? ROMA 是一个全自主研发的前端开发框架…
Requirements:
1. Explain step‑by‑step with code examples.
2. Cite source version.
3. If information is missing, state that you cannot answer.2.2.5 LLM Generation
The final prompt is sent to an LLM (e.g., GPT‑4, Claude) to generate the answer.
Optimizations across all stages include mixed chunking strategies, overlapping windows, and metadata‑driven filtering.
JoyAgent Example
JoyAgent (AutoBots) demonstrates a complete RAG pipeline from document ingestion to answer generation.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
