Understanding Large Language Models, Retrieval‑Augmented Generation, and AI Agents: Concepts, Engineering Practices, and Applications
This article explains the fundamentals and engineering practices of large language models (LLM), retrieval‑augmented generation (RAG) and AI agents, compares small and large embedding models, provides Python code for vector‑database RAG with Chroma, and discusses integration, use cases, and future challenges in AI development.
Introduction
The rapid development of artificial‑intelligence technologies such as large language models (LLM), retrieval‑augmented generation (RAG) and agents has reshaped how we interact with machines and build AI‑enabled applications. Understanding the definitions and relationships among these three concepts is essential for AI‑oriented programming.
LLM, RAG and Agent Overview
Component
Definition
Role
LLM
Large language models (e.g., GPT series, BERT) trained on massive text corpora to generate coherent text, understand language and answer questions.
Provides the core language understanding and generation capability.
RAG
Combines traditional information‑retrieval with generative models: first retrieve relevant passages from a knowledge base, then generate answers using the LLM.
Extends LLMs with up‑to‑date, domain‑specific knowledge.
Agent
Programs or devices that perceive the environment and act autonomously, often orchestrating LLM and RAG to accomplish tasks.
Integrates LLM and RAG at the application layer to perform decision‑making and execution.
From a hierarchical perspective, LLMs form the foundation, RAG builds on LLMs to provide more accurate outputs, and agents combine both with perception and planning to execute complex tasks.
LLM Engineering Practices
OpenAI’s GPT series (GPT‑3.5, GPT‑4) are transformer‑based models that use self‑attention to capture long‑range dependencies. Three subscription tiers (Free, Plus, Team) are available, with ChatGPT Plus offering GPT‑4 capabilities.
Chat (Conversation)
Chat with GPT‑4 supports text, image generation via DALL‑E, and real‑time web search for up‑to‑date information.
GPTs (Plugins)
Custom GPTs let users combine their own instructions, knowledge bases, or APIs with a pre‑trained LLM. The workflow includes:
Describe the desired assistant in the GPT Builder wizard.
Provide naming, logo and brief prompts.
Define Instructions (system prompt) covering purpose, context, input constraints and output format.
Upload static knowledge files (documents, tables, images) for “static” enhancement.
Add Actions that call external APIs (OpenAPI 3 spec) for “dynamic” enhancement.
Publish the custom GPT to the GPTs Store.
RAG Technology and Vector Databases
RAG first retrieves relevant chunks from a knowledge base (usually a vector database) and then feeds them to an LLM for answer generation. The core workflow is illustrated in the figure below (omitted). Typical use cases include Q&A, recommendation and data analysis.
Vector databases store high‑dimensional embeddings and support fast similarity search via ANN algorithms. Advantages:
Handle high‑dimensional data that relational databases cannot.
Provide sub‑second retrieval speeds.
Enable semantic (vector‑based) reasoning rather than keyword matching.
Embedding models can be “small” (e.g., BERT‑base‑chinese, ~400 MB) or “large” (e.g., OpenAI text‑embedding‑ada‑002, billions of parameters). Small models are suitable for local, domain‑specific tasks; large models deliver higher accuracy on longer texts.
Python Example: RAG with Chroma
The following Python code demonstrates how to create a Chroma collection, load documents, generate embeddings (both small and large models), and query the collection.
import chromadb
basePath = "/dev/chromadbDemo/"
chroma_client = chromadb.PersistentClient(path=basePath + "chromadata")
print("数据库已启动:" + str(chroma_client))
# --- Prepare data ---
file_path_hlm = basePath + "book_HLM.txt"
file_path_jpm = basePath + "book_JPM.txt"
file_path_shz = basePath + "book_SHZ.txt"
file_path_zhw = basePath + "book_ZHW.txt"
docs = [
open(file_path_hlm, "r", encoding="utf-8").read(),
open(file_path_jpm, "r", encoding="utf-8").read(),
open(file_path_shz, "r", encoding="utf-8").read(),
open(file_path_zhw, "r", encoding="utf-8").read(),
"不可以,早晨喝牛奶不科学",
"吃了海鲜后是不能再喝牛奶的,因为牛奶中含得有维生素C,如果海鲜喝牛奶一起服用会对人体造成一定的伤害",
"吃海鲜是不可以吃柠檬的因为其中的维生素C会和海鲜中的矿物质形成砷",
"吃海鲜是不能同时喝牛奶吃水果,这个至少间隔6小时以上才可以"
]
metas = [
{"source": file_path_hlm, "uris": file_path_hlm, "author": "曹雪芹"},
{"source": file_path_jpm, "uris": file_path_jpm, "author": "兰陵笑笑生"},
{"source": file_path_shz, "uris": file_path_shz, "author": "施耐庵"},
{"source": file_path_zhw, "uris": file_path_zhw, "author": "托尔金"},
{"source": "my_source1"},
{"source": "my_source2"},
{"source": "my_source3"},
{"source": "my_source4"}
]
ids = ["id-hlm", "id-jpm", "id-shz", "id-zhw", "id1", "id2", "id3", "id4"]Utility functions for embedding, inserting data, and querying are defined next (code omitted for brevity). Two query examples illustrate the difference between a small model (bert‑base‑chinese) and a large model (text‑embedding‑ada‑002):
Small model distances are in the hundreds, and results on long texts are often noisy.
Large model distances are < 1, and the retrieved documents are semantically much closer to the query.
Integration of LLM, RAG and Agents
Frameworks such as LangChain enable developers to compose LLMs, RAG pipelines and agents into unified workflows. Chains connect processing steps, while agents decide which tools (search, weather API, Wikipedia, etc.) to invoke based on the task.
Use Cases
Intelligent customer service: LLM for understanding, RAG for factual retrieval, agent for dialogue management.
Personalized education platforms: LLM generates content, RAG fetches domain resources, agent adapts the learning path.
Complex decision‑support (finance, healthcare, research): LLM processes language, RAG supplies up‑to‑date data, agent synthesizes recommendations.
Supply‑chain logistics: agents optimize inventory (WMS) and routing (TMS) using RAG‑enhanced insights.
Future Outlook and Challenges
AI systems will become more self‑adaptive, integrating embodied intelligence, while facing ethical concerns such as transparency, bias, and privacy. The democratization of AI tools (e.g., Copilot, Midjourney, JoyCoder) will reshape developer roles, shifting focus from routine coding to prompt engineering, supervision and system integration.
Overall, mastering LLM fundamentals, RAG techniques, vector‑database selection, and agent orchestration is crucial for building robust, scalable AI applications.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.