What Is Retrieval‑Augmented Generation (RAG) and How Does It Work?

This article explains Retrieval‑Augmented Generation (RAG), an AI framework that combines traditional information retrieval with large language models, covering its core workflow—from knowledge preparation, data cleaning, and metadata extraction to query preprocessing, vector retrieval, reranking, information integration, and final LLM generation, while also reviewing common embedding models and vector databases.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
What Is Retrieval‑Augmented Generation (RAG) and How Does It Work?

1. What Is RAG?

RAG (Retrieval‑Augmented Generation) is an AI framework that combines traditional information retrieval (e.g., databases) with generative large language models (LLMs). Instead of relying solely on the LLM’s internal knowledge, it first "looks up" external resources and generates answers based on them.

RAG addresses several key challenges of deploying large models:

Knowledge freshness : breaks the time‑bound limitation of model training data.

Hallucination reduction : lowers the probability of fabricated answers and provides source references.

Information security : uses external knowledge bases instead of internal training data, reducing privacy leakage.

Domain‑specific knowledge : integrates vertical expertise without retraining.

2. RAG Core Workflow

2.1 Knowledge Preparation Phase

1. Data preprocessing

1. Document parsing

Input : raw documents (Markdown, PDF, HTML).

Operation : extract plain text, handle special formats (code blocks, tables, images, videos).

[标题] 什么是 ROMA?
[段落] ROMA 是一个全自主研发的前端开发基于自定义DSL(Jue语言),一份代码,可在 iOS、Android、Harmony、Web 四端运行的跨平台解决方案。
[段落] ROMA 框架的中文名为罗码。

Document parsing must consider various content types and may use third‑party tools or vision/semantic models.

2. Data cleaning and standardization

Improve text quality and consistency for more accurate vector representations, remove noise, standardize timestamps, units, etc., using tools such as NLTK or spaCy.

ROMA框架
处理: "ROMA框架"
今天的室外温度为35°C,天气晴朗。
处理: "2025-07-17 的室外温度为35°C,天气晴朗"

3. Metadata extraction

Metadata (source, creation time, author, document type, etc.) enriches retrieval quality.

Retrieval enhancement: filter by time, author, topic; improve relevance with vector similarity + metadata.

Context enrichment: include source, date, relationships.

Common extraction methods include regex/HTML parsers, NLP (NER, keyword extraction), machine‑learning models, and external APIs.

2.2 Question‑Answering Phase

1. Query preprocessing

Intent detection, query cleaning, and augmentation (e.g., synonym generation) prepare the question for retrieval.

2. Retrieval (Recall)

1. Vectorization

Encode the processed query with the same embedding model used for the corpus.

{"vector": [0.052, -0.021, 0.075, ...], "top_k": 3, "score_threshold": 0.8, "filter": {"doc_type": "技术文档"}}

2. Retrieval

Similarity search (cosine), keyword search, or hybrid retrieval returns the most relevant chunks.

3. Reranking

A reranking model re‑scores the initial results to improve relevance and handle synonyms or polysemy.

3. Information Integration

Format retrieved results, build prompt templates, and optionally truncate or summarize long texts to fit the LLM’s context window.

Prompt template:
You are a ROMA framework expert. Answer based on the following context...

4. LLM Generation

Send the prompt to an LLM (e.g., GPT‑4, Claude) and obtain the final answer.

Embedding Models

Common models include all‑minilm‑l6‑v2 (384‑dim, efficient), text‑embedding‑ada‑002 (1536‑dim, high performance), BERT variants (768‑dim), and BGE (768‑dim, top‑ranked on MTEB).

Vector Databases

Popular options: ChromaDB (lightweight, Python‑friendly), FAISS (billion‑scale, high performance), Milvus (distributed, enterprise‑grade), Pinecone (managed SaaS), and Elasticsearch (full‑text + vector support).

Optimization Tips

Hybrid chunking: combine paragraph‑level splitting with size‑based adjustments.

Overlap tuning: dynamically adjust overlapping regions to improve recall.

Mixed retrieval: fuse vector, keyword, and hybrid results.

Reranking normalization: map scores to [0, 1] for fair combination.

Artificial IntelligenceLLMRAGRetrieval-Augmented Generation
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.