Designing Generative AI Agents: Models, Tools, Extensions, Function Calls, and Data Storage
The article explains how generative AI agents combine language models, tool integration, self‑guided planning, prompt‑engineering frameworks, extensions, function calls, and vector‑based data storage to create adaptable, retrieval‑augmented systems that can interact with real‑world APIs and perform complex tasks.
Humans excel at complex pattern‑recognition tasks and often rely on external tools such as books, search engines, or calculators to augment their prior knowledge before reaching conclusions.
Similarly, generative AI models can be trained to use tools that provide real‑time information or suggest real‑world actions, such as retrieving a customer's purchase history from a database to generate personalized recommendations or invoking APIs to send emails or execute financial transactions.
For this capability, a model must not only access a set of external tools but also be able to plan and execute tasks autonomously , a combination of reasoning, logic, and external information access that gives rise to the concept of an agent —a program that extends beyond the core generative model.
1. Model
In the context of agents, the model refers to the language model (LM) that serves as the central decision‑maker. An agent may employ one or multiple LMs of any size, capable of following instruction‑based reasoning frameworks such as ReAct , Chain‑of‑Thought (CoT) , or Tree‑of‑Thoughts (ToT) . Models can be general‑purpose, multimodal, or fine‑tuned for specific agent architectures, and should be selected and, ideally, pre‑trained on data that reflects the tools the agent will use.
2. Tools
Tools come in various forms and complexities, often aligning with common web API methods (GET, POST, PATCH, DELETE). They enable agents to update databases, fetch weather data, or retrieve other real‑world information, thereby supporting advanced systems such as Retrieval‑Augmented Generation (RAG) that extend the agent’s capabilities beyond the base model.
3. Agent vs. Model Differences
Model
Agent
Knowledge limited to training data.
Extends knowledge by connecting to external systems via tools.
Performs a single inference per user query; no built‑in conversation history.
Manages conversation history and can perform multi‑turn reasoning based on orchestrated decisions.
No native tool implementation.
Tools are native components of the agent architecture.
No native logical layer; prompts are simple or use reasoning frameworks.
Uses native cognitive architectures like CoT, ReAct, or frameworks such as LangChain.
4. Common Prompt‑Engineering Frameworks
ReAct provides a thinking‑process strategy that lets a language model reason about a query and take actions, improving human‑AI interaction and benchmark performance. CoT introduces intermediate reasoning steps, with variants such as self‑consistency, active prompting, and multimodal CoT. Tree‑of‑Thoughts (ToT) generalizes CoT for exploratory or strategic planning tasks.
5. Extensions (Custom Plugins)
Extensions act as standardized bridges between APIs and agents, allowing seamless execution of API calls without exposing the underlying implementation. They enable agents to learn from examples how to invoke specific endpoints and which parameters are required, supporting dynamic selection of the most appropriate extension for a given user query.
6. Function Calls
Function calls resemble extensions but differ in execution location: the model outputs a function name and parameters, which are then executed on the client side rather than within the agent. This approach is useful when direct API access is restricted, when additional data‑transformation logic is needed, or when developers want to stub APIs during iterative development.
7. Data Storage
Data storage lets developers provide raw documents to an agent, which are transformed into vector‑database embeddings. The agent can retrieve relevant information from these embeddings to inform subsequent actions or responses, enabling Retrieval‑Augmented Generation without costly re‑training or fine‑tuning.
Summary Comparison
Extension
Function Call
Data Storage
Execution
Agent‑side execution
Client‑side execution
Agent‑side execution
Use Cases
Developers want agents to control API interactions; useful for multi‑hop planning and local pre‑built extensions (e.g., Vertex Search, code interpreter).
Security or authentication limits prevent direct API calls; time‑ordering constraints; APIs not publicly exposed.
Developers need RAG with website content, PDFs, Word, CSV, spreadsheets, or unstructured data formats.
The page also contains a promotional notice for a “DevOps Engineer” certification from the Ministry of Industry and Information Technology, encouraging readers to enroll via a contact person.
DevOps
Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.