Retrieval Augmented Generation (RAG): Concepts, Workflow, and LangChain Implementation
The article outlines LLM issues such as hallucination, outdated knowledge, and data privacy, then explains Retrieval‑Augmented Generation—detailing its data‑preparation and query‑time retrieval workflow, demonstrates a full LangChain implementation, and contrasts RAG with fine‑tuning as complementary strategies for up‑to‑date, grounded responses.
Introduction
With the rapid development of large language models (LLMs), Retrieval Augmented Generation (RAG) has become a key technique for improving model reliability, reducing hallucinations, and ensuring data security. This article first outlines the main challenges of LLMs and then explains how RAG addresses them.
LLM Problems
Hallucination : Probabilistic generation can produce false information when no answer exists.
Timeliness : Models trained on static data (e.g., up to 2021) cannot answer up‑to‑date queries such as today’s movie listings.
Data Security : Uploading proprietary documents to public LLM services raises privacy concerns.
RAG mitigates these issues by retrieving external knowledge at query time, providing more accurate and current responses.
What is RAG?
RAG (Retrieval Augmented Generation) combines information retrieval with LLM prompting. The retrieved documents are injected into the prompt as context, allowing the model to generate answers grounded in up‑to‑date data.
RAG Workflow
The process consists of two main stages:
Data Preparation : Extraction → Chunking → Embedding → Storage.
Retrieval & Generation : Query embedding → Similarity search → Context injection → LLM answer generation.
Data Preparation Stage
This offline stage converts private data into vector embeddings and stores them in a vector database.
1. Data Extraction : Convert PDFs, Word, markdown, databases, APIs, etc., into a unified text format.
2. Chunking : Split documents into semantically coherent chunks (e.g., 500 characters with 10‑character overlap).
3. Embedding : Transform text chunks into dense vectors using models such as moka-ai/m3e-base or other HuggingFace embeddings.
4. Vector Store : Persist vectors in databases like FAISS (local), Chroma, Elasticsearch, Milvus, etc.
Application Stage
During query time, the system retrieves relevant chunks and feeds them to the LLM.
1. Data Retrieval : Similarity search (cosine, Euclidean) or full‑text search retrieves top‑k relevant documents.
2. Prompt Injection : The retrieved context is concatenated with a task description and the user question.
Example prompt (shown in code block):
prompt = f"""
Give the answer to the user query delimited by triple backticks ```{query}```
using the information given in context delimited by triple backticks ```{context}```.
If there is no relevant information in the provided context, try to answer yourself,
but tell user that you did not have any relevant context to base your answer on.
Be concise and output the answer of size less than 80 tokens.
"""Practical Example with LangChain
The following code demonstrates a complete RAG pipeline using LangChain.
Environment Setup
# 环境准备,安装相关依赖
pip install langchain sentence_transformers chromadbLoad Local Data
from langchain.document_loaders import TextLoader
loader = TextLoader("./data/paul_graham_essay.txt")
documents = loader.load()Document Splitting
# 文档分割
from langchain.text_splitter import CharacterTextSplitter
# 创建拆分器
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=10)
# 拆分文档
documents = text_splitter.split_documents(documents)Embedding
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import Chroma
# embedding model: m3e-base
model_name = "moka-ai/m3e-base"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
embedding = HuggingFaceBgeEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)Persist Vectors
# 指定 persist_directory 将会把嵌入存储到磁盘上。
persist_directory = 'db'
db = Chroma.from_documents(documents, embedding, persist_directory=persist_directory)Retriever
retriever = db.as_retriever()Prompt Template
from langchain.prompts import ChatPromptTemplate
template = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)RAG Chain Construction
from langchain_community.chat_models import ChatOllama
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
llm = ChatOllama(model='llama3')
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
query = "What did the author do growing up?"
response = rag_chain.invoke(query)
print(response)Running the above pipeline with a locally hosted llama3 model yields an answer such as:
Before college, Paul Graham worked on writing and programming outside school. He didn't write essays, but instead focused on writing short stories. His stories were not very good, having little plot and just characters with strong feelings.RAG vs. Fine‑Tuning
RAG is comparable to an open‑book exam: the model can look up a reference book at inference time. Fine‑tuning is akin to memorizing knowledge through extensive training. Both techniques can complement each other—RAG provides up‑to‑date factual grounding, while fine‑tuning improves style, domain adaptation, and instruction following.
Conclusion
The article introduced the challenges of LLMs, explained the concept and workflow of Retrieval Augmented Generation, and provided a concrete LangChain implementation. It also compared RAG with fine‑tuning to help practitioners choose the appropriate strategy for their use cases.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.