Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation
Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.
RAG (Retrieval‑Augmented Generation) is a technique that enhances large language models (LLMs) with external knowledge sources to improve answer accuracy and reduce hallucinations.
The article first explains why fine‑tuning is costly and how RAG offers a low‑cost, fast alternative. It then describes the three‑step RAG workflow: retrieval, augmentation, and generation.
Retrieval: user query is embedded and matched against a vector database to fetch the most relevant documents.
Augmentation: the retrieved context is inserted into a prompt template.
Generation: the prompt with context is fed to the LLM to produce the final answer.
Implementation using LangChain (an open‑source AI application framework) and Python is demonstrated. The steps include:
Loading raw data (text, PDF, CSV, etc.) with appropriate document loaders.
Chunking documents into smaller pieces using CharacterTextSplitter .
Embedding chunks with OpenAI embeddings and storing them in a Weaviate vector store.
Creating a retriever from the vector store.
Defining a prompt template and assembling a RAG chain that connects the retriever, prompt, and an OpenAI chat model.
Key code snippets:
import requests from langchain.document_loaders import TextLoader url = "https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt" res = requests.get(url) with open("state_of_the_union.txt", "w") as f: f.write(res.text) loader = TextLoader('./state_of_the_union.txt') documents = loader.load() from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50) chunks = text_splitter.split_documents(documents) from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores import Weaviate import weaviate from weaviate.embedded import EmbeddedOptions client = weaviate.Client(embedded_options=EmbeddedOptions()) vectorstore = Weaviate.from_documents(client=client, documents=chunks, embedding=OpenAIEmbeddings(), by_text=False) retriever = vectorstore.as_retriever() from langchain.prompts import ChatPromptTemplate template = """You are an assistant for question‑answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:""" prompt = ChatPromptTemplate.from_template(template) from langchain.chat_models import ChatOpenAI from langchain.schema.runnable import RunnablePassthrough from langchain.schema.output_parser import StrOutputParser llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) rag_chain = ({"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser()) query = "What did the president say about Justice Breyer" rag_chain.invoke(query)The example query returns a concise answer derived from the retrieved context, demonstrating how RAG can provide up‑to‑date, verifiable information while reducing hallucinations and training costs.
Benefits of using RAG with LLMs include:
Access to the latest and most accurate content with source traceability.
Reduced risk of hallucinations and leakage of sensitive data.
Lower financial overhead by avoiding frequent model retraining.
iKang Technology Team
The iKang tech team shares their technical and practical experiences in medical‑health projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.