36 min read

30 Essential AI Agent Concepts: From LLMs to Multi‑Agent Systems

This comprehensive guide systematically explains thirty core terms of AI agents—covering foundational large language models, fine‑tuning techniques, multimodal vision‑language models, agent architectures such as ReAct and CoT, tool‑calling protocols, retrieval‑augmented generation, workflow orchestration, and emerging product forms like autonomous and embodied agents—while detailing the reasoning, trade‑offs, and concrete examples that shape modern agent engineering.

Architect's Must-Have

Apr 21, 2026

30 Essential AI Agent Concepts: From LLMs to Multi‑Agent Systems

Overview

Since 2023 the rapid improvement of large language models (LLMs) has turned AI agents from a research curiosity into a production‑ready technology. The article organizes thirty essential terms into five layers—foundation models, agent architecture, tool & communication, engineering practice, and product form—to give readers a complete mental model of how agents work from the bottom‑up.

Foundation Model Layer

1. LLM (Large Language Model)

LLMs are built on the Transformer architecture (Vaswani et al., 2017) and generate text by predicting the next token with a self‑attention mechanism (Attention(Q,K,V)=softmax(QKᵀ/√dₖ)V). When parameters exceed a few billion, scaling laws produce emergent abilities such as zero‑shot learning and chain‑of‑thought reasoning (Wei et al., 2022). Representative models include GPT‑4o (multimodal, fast), Claude 4 series (200 K context, safety‑aligned), Gemini 2.5 (native multimodal), Llama 3 (8 B‑405 B open‑source benchmark), and DeepSeek‑V3 (671 B MoE, low inference cost).

In agents the LLM provides natural‑language understanding, reasoning, tool‑selection, and context management. Model choice balances capability, latency, and cost; high‑frequency decision loops favor fast models like GPT‑4o‑mini or Claude Haiku, while deep reasoning tasks prefer stronger models such as Claude Opus or o3.

2. Fine‑tuning

Three main paradigms adapt a generic LLM to a specific domain:

Full Fine‑tuning : updates all parameters; yields the best performance but requires thousands of high‑quality examples and multiple high‑end GPUs.

Instruction Tuning : trains on (instruction, input, output) triples; the key technique behind InstructGPT and Alpaca.

Alignment Tuning : uses RLHF or Direct Preference Optimization (DPO) to align outputs with human preferences (SFT → reward model → PPO for RLHF; DPO skips the reward model).

Fine‑tuned agents can produce structured outputs (e.g., JSON tool‑call arguments) or follow domain‑specific APIs, dramatically improving task success—for example a customer‑service agent fine‑tuned for refund workflows.

3. LoRA (Low‑Rank Adaptation)

LoRA (Hu et al., 2021) freezes the pretrained weights W and learns two small matrices A and B such that ΔW=BA, reducing trainable parameters from d·k to r·(d+k). For Llama 2‑7B full fine‑tuning needs ~56 GB GPU memory, while LoRA (r=16) needs only ~16 GB. Variants include QLoRA (4‑bit quantization, 65 B model on a single 24 GB GPU), DoRA (direction‑amplitude factorisation), and LoRA+ (different learning rates for A and B).

4. MoE (Mixture of Experts)

MoE models consist of many expert sub‑networks and a gating network that activates the top‑K experts per token. Mixtral 8×7B, for instance, has eight 7 B experts (total 46.7 B parameters) but only two are active during inference (~12.9 B active parameters). Load‑balancing auxiliary loss, expert offloading, and fine‑grained expert splitting (DeepSeek‑V2/V3) keep memory usage low while allowing specialist behaviour for language, code, or math tasks.

5. Multimodal & Vision‑Language Models (VLM)

Multimodal models extend LLMs with visual encoders (e.g., ViT‑L/14) and projection layers to fuse image tokens with text. Early‑fusion examples are GPT‑4o/4V and Gemini, while late‑fusion is represented by Flamingo. Product‑level VLMs include GPT‑4o (native multimodal), Claude 3.5 Sonnet (strong OCR), Qwen‑VL series (multiple images), and LLaVA (open‑source benchmark). In agents, VLMs enable GUI agents that “see” screen elements and embodied agents that perceive physical environments.

Agent Architecture Layer

7. Agent Core

Following Wooldridge & Jennings (1995), a complete agent possesses Perception, Planning, Memory, Action, and Reflection. Traditional AI is passive (input → output), whereas agents run an active loop—continuously sensing, reasoning, acting, and learning until a goal is reached.

9. ReAct (Reasoning + Acting)

ReAct (Yao et al., 2023) interleaves Thought, Action, and Observation. Example:

Thought: I need today’s weather in Beijing
Action: search("Beijing today weather")
Observation: Beijing is sunny, 15‑25°C
Thought: Now I can answer the user
Action: finish("Beijing is sunny, 15‑25°C, suitable for travel.")

This loop provides explanations for actions and feeds real‑world results back into reasoning, outperforming pure Chain‑of‑Thought on benchmarks like HotpotQA.

10. Chain‑of‑Thought (CoT)

CoT (Wei et al., 2022) prompts LLMs to generate intermediate reasoning steps. Adding “Let’s think step by step” raises PaLM‑540B accuracy on GSM8K from 17.9 % to 58.1 %.

11. Planning

Agents decompose goals into sub‑tasks, analyse dependencies, evaluate required tools, and define fallback paths. Planning algorithms include HTN (Hierarchical Task Network), LLM‑as‑Planner (e.g., HuggingGPT), ReWOO (full plan before execution), and LATS (Monte‑Carlo Tree Search).

12. Reflection

Reflection lets agents assess their own actions. The Reflexion framework (Shinn et al., 2023) records failures, generates reflective text, stores it in memory, and re‑uses it on similar future tasks.

13. Memory

Memory is split into Sensory (LLM context window), Working (dialogue history + summary), and Long‑term (vector DB or file storage). Implementations manage sliding windows, summarisation, vector‑DB retrieval, and knowledge‑graph integration. Challenges include forgetting, interference, retrieval precision, and privacy‑preserving encryption.

14‑15. Supervisor & Worker Agents

Supervisor agents route user requests to specialised Worker agents (e.g., coder, tester, reviewer, deployer). Routing strategies range from static rule‑based maps to LLM‑driven routing, bidding, and dynamic delegation. Open‑source implementations include LangGraph Supervisor, CrewAI, and Microsoft AutoGen.

Tool & Communication Layer

16. Tool Calling

Agents generate structured JSON tool‑call requests (function name + arguments). The workflow: user intent → LLM decides tool → LLM emits JSON → framework executes → result fed back → final response.

17. Function Calling

OpenAI’s Function Calling (June 2023) standardises the JSON schema for tool invocation, separating intent generation from actual execution.

18. MCP (Model Context Protocol)

Anthropic’s MCP (Nov 2024) defines a client‑server protocol that unifies tool, resource, and prompt exposure, reducing the integration matrix from M×N to M+N. Supported transports include stdio and HTTP + SSE.

19. A2A (Agent‑to‑Agent)

Google’s A2A (Apr 2025) enables agents to discover each other via Agent Cards (JSON capability descriptors) and exchange Tasks (messages, artifacts, status). It complements MCP by handling inter‑agent collaboration.

20. Plugins

ChatGPT Plugins introduced OpenAPI‑based extensions, the precursor to Function Calling and MCP. Although the Plugin Store closed in 2024, the concept of “LLM as a platform” persists in GPT‑s and Claude Projects.

21. Vector DB

Vector databases (Pinecone, Weaviate, Milvus, Qdrant, Chroma, FAISS) store embeddings for similarity search. Retrieval‑augmented generation (RAG) relies on these stores for long‑term memory and knowledge‑base access.

Engineering Practice Layer

22. RAG (Retrieval‑Augmented Generation)

Standard RAG pipeline: user query → query rewrite → vector retrieval → re‑ranking → prompt injection → LLM generation. Advanced variants include Agentic RAG (agents decide when/what to retrieve), Graph RAG (knowledge‑graph retrieval), Corrective RAG (post‑retrieval quality check), and Self‑RAG (LLM self‑evaluates retrieval relevance).

23. Prompt Engineering

Key techniques: role prompting, few‑shot examples, chain‑of‑thought, structured output (JSON/XML), and constraint statements (“don’t hallucinate”). Frameworks such as CRISPE (Capacity‑Role‑Insight‑Statement‑Personality‑Experiment) and RISEN (Role‑Instructions‑Steps‑End‑Goal‑Narrowing) help organise prompts for agents.

24. Workflow Orchestration

Agents can be composed as Chains, Parallel branches, Conditional branches, Loops, or Human‑in‑the‑Loop steps. Popular orchestrators include LangChain/LangGraph, LlamaIndex (RAG‑focused), CrewAI (role‑based multi‑agent), AutoGen (flexible dialogue topologies), Semantic Kernel, Prefect, and Airflow. Engineering concerns cover observability, fault tolerance, versioning, and token‑cost control.

25. Development Frameworks

Comparative matrix:

LangChain/LangGraph – richest ecosystem, general‑purpose.

CrewAI – role‑play multi‑agent, easy to grasp.

AutoGen – flexible dialogue topologies, research‑grade.

LlamaIndex – data‑centric RAG pipelines.

Haystack – production‑grade NLP pipelines.

Semantic Kernel – enterprise‑focused (.NET/Python).

Smolagents – lightweight code‑agent prototypes.

Selection guidance: quick prototyping with LangChain + OpenAI; production with LangGraph + observability; multi‑agent with CrewAI or AutoGen; knowledge‑base apps with LlamaIndex + vector DB.

26. Code Interpreter

Agents can execute sandboxed Python code (e.g., OpenAI’s Code Interpreter) to perform data analysis, generate plots, or manipulate files. Security measures include CPU/memory/time limits, network isolation, filesystem sandboxing, and library whitelisting.

27. Orchestrator

Orchestrators coordinate multiple agents, manage state, route tasks, schedule resources, aggregate results, and recover from failures. Architectural patterns include centralized (Supervisor), decentralized (peer‑to‑peer), event‑driven (Kafka/RabbitMQ), and state‑machine orchestration. Unlike deterministic micro‑service orchestration, agents introduce uncertainty (model hallucinations, format errors) that must be mitigated with retries, fallbacks, and validation.

Product Form Layer

28. Multi‑Agent Systems

Collaboration modes: hierarchical (Supervisor → Worker), debate (agents argue), voting (independent solutions, majority vote), pipeline (sequential processing). Notable systems: MetaGPT (software‑company role simulation), ChatDev (agent‑driven software development), CrewAI (simple multi‑agent), AutoGen (flexible multi‑agent dialogue).

29. Embodied Agents

Embodied agents close the perception‑action loop in the physical world. Stack: high‑level LLM planning → task decomposition → low‑level motion planning → actuator control, with feedback from cameras and sensors. Research prototypes include SayCan, PaLM‑E, VoxPoser, and RT‑2. Core challenges are sim‑to‑real transfer gaps and strict safety constraints.

30. H2A (Human‑to‑Agent) Interaction

Interaction paradigms evolve from command‑style, to dialogue‑style, to delegation (high‑level goals), to collaborative (human and agent co‑act). Design principles emphasise controllability, transparency, intervene‑ability, and progressive trust. Product examples: Claude Code (plan review & permission system), Cursor/Copilot (IDE‑assistant), Devin (full‑process software engineer), OpenAI Operator (browser agent with confirmation prompts). Future direction: humans become supervisors and coaches rather than direct operators.

Conclusion

The AI agent ecosystem is transitioning from proof‑of‑concept to large‑scale engineering. Mastering the thirty core concepts provides a systematic framework—from foundational model capabilities, through architectural patterns and tool protocols, to practical engineering pipelines and emerging product forms. The pace of change is extreme; standards such as MCP and A2A have become de‑facto within months, making continuous learning essential for anyone building or deploying AI agents.

AI agents prompt engineering large language models RAG embodied AI multi-agent systems Tool Calling

Written by

Architect's Must-Have

Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.