Understanding AI Agents: From Chatting to Getting Things Done

The article explains the four essential components of AI Agents—brain, memory, tool, and planning layers—illustrates their implementation with Python code, compares planning strategies, shares a real-world OOM fault‑diagnosis case, and lists common pitfalls to help newcomers build functional agents.

Architect's Ambition
Architect's Ambition
Architect's Ambition
Understanding AI Agents: From Chatting to Getting Things Done

Why Simple Prompts Aren’t Enough

A DevOps colleague complained that asking GPT to troubleshoot production incidents only yields generic advice like "check logs" or "monitor memory"—it cannot actually perform the investigation. This gap motivates a deeper look at AI Agents, which require more than a well‑crafted prompt.

Four Core Modules of an AI Agent

1. Brain Layer – The LLM Foundation

Not every large model can serve as an agent. Tests with a 7B open‑source model failed on simple tasks such as querying user orders. An agent‑ready model must have:

Tool‑calling ability : correctly decide when to use a tool and generate accurate parameters.

Long‑context understanding : retain the overall goal and comprehend lengthy tool outputs.

Experience tip: for internal business scenarios, closed‑source models like GPT‑4o or Doubao 4.0 achieve at least 30% higher tool‑calling accuracy than open‑source alternatives. For on‑premises deployment, use models with ≥34 B parameters and fine‑tune them for tool use.

2. Memory Layer – Avoid Forgetting Mid‑Task

Standard LLMs only remember the current turn; closing the window erases everything. An agent’s memory is split into:

Short‑term memory : stores the current task’s progress, intermediate results, and tool outputs—analogous to human working memory.

Long‑term memory : keeps knowledge‑base content, historical execution experience, and user preferences, typically implemented with a vector database.

Example long‑term memory implementation (Python, LangChain, Chroma):

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("internal_knowledge.md")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(texts, embeddings, persist_directory="./memory_db")
db.persist()
query = "线上OOM故障怎么排查"
docs = db.similarity_search(query, k=3)

This code stores operation manuals and incident records in a vector store, allowing the agent to retrieve relevant knowledge without retraining.

3. Tool Layer – Giving the Model Hands

The model itself cannot interact with the external world; it must invoke tools. Tools are wrapped as interfaces that describe their purpose, required parameters, and return format. Common categories:

System tools – execute shell commands, run code, read/write files.

Business tools – query databases, call internal APIs, send notifications.

General tools – web search, calculator, document parsing.

Pitfalls: overloading an agent with too many tools confuses the model. Keeping the toolset ≤5 per agent yields better accuracy. Also, truncate tool results; long outputs can exhaust the model’s context window and cause the agent to lose track of the original goal.

4. Planning Layer – Think Before Acting

Complex tasks must be broken into smaller steps. For an OOM investigation, the agent needs to collect JVM stack traces, fetch recent code commits, compare anomalies, reproduce the issue, and finally generate a remediation report.

Three mainstream planning approaches:

ReAct : interleaves reasoning and action, suitable for simple single‑step tasks. Easy to implement but can drift on complex workflows.

Chain of Thought (CoT) : enumerates the full sequence of steps first, then executes them. Works well for medium‑complexity tasks; requires the model to have strong planning capability.

Tree of Thoughts (ToT) : generates multiple possible execution paths, evaluates them, and selects the best. High accuracy for exploratory tasks but slower and resource‑intensive.

Practical advice: use ReAct for straightforward Ops or Customer‑Service agents, CoT for more sophisticated R&D assistants, and reserve ToT for rare, highly complex scenarios.

End‑to‑End Agent Run (Real‑World OOM Diagnosis)

1. Task Reception

The agent receives an alert from the monitoring system: "Order service OOM, memory usage 95%" along with service name and time window.

2. Task Planning

The agent decomposes the job into five steps:

Call the monitoring API to fetch JVM stack traces for the last 30 minutes.

Call the Git API to retrieve code commits from the past two hours.

Match exception classes in the stack trace with recent code changes to pinpoint the root cause.

Invoke a test‑environment endpoint to reproduce the issue and verify the hypothesis.

Generate a diagnostic report and remediation plan, then send it to the Ops team.

3. Tool Execution

Step‑by‑step execution results:

Monitoring API reveals a memory leak in the new Excel‑export feature.

Git history shows a recent commit adding the export function.

Code comparison uncovers missing pagination, causing a full‑table scan of 100 k rows.

Test‑environment run reproduces the OOM.

Report recommends adding pagination and limiting each export to 10 k rows.

4. Result Verification

After each step the agent checks whether the outcome meets expectations; failures trigger retries or human escalation, preventing error propagation.

5. Task Completion

The entire workflow finishes autonomously in about one minute—far faster than the half‑hour a human engineer would need, and with higher accuracy.

Five Common Pitfalls When Building AI Agents

Over‑reliance on Prompt Engineering : Good prompts alone cannot compensate for missing memory or tool support; many teams waste months tweaking prompts only to see poor production performance.

Overly Complex Tool Design : A tool with ten parameters caused frequent mis‑calls. Splitting it into three simple tools with 2–3 parameters each dramatically improved accuracy.

Missing Result Validation : Without verification, agents can propagate erroneous conclusions. Adding per‑step checks and fallback logic eliminates this risk.

Chaotic Context Management : Dumping all intermediate results into the context quickly exceeds the model’s window. Keeping only the latest three steps in the active context and storing older results in short‑term memory avoids forgetting.

Trying to Build a Universal Agent Too Soon : Starting with a narrow, high‑frequency scenario (e.g., OOM diagnosis) and achieving >90% accuracy before expanding yields far better results than attempting an all‑purpose agent from day one.

Starter Agent Code (≈100 Lines)

A minimal, runnable document‑question‑answer agent that integrates memory, tool calling, and planning:

from openai import OpenAI
import chromadb
from chromadb.utils import embedding_functions

client = OpenAI(api_key="YOUR_API_KEY")
chroma_client = chromadb.PersistentClient(path="./memory_db")
embedding_func = embedding_functions.OpenAIEmbeddingFunction(api_key="YOUR_API_KEY")

collection = chroma_client.get_or_create_collection(name="knowledge_base", embedding_function=embedding_func)
collection.add(
    documents=[
        "线上OOM故障排查步骤:1. 查JVM堆栈 2. 查最近提交记录 3. 定位内存泄漏点",
        "接口超时排查步骤:1. 查监控看耗时分布 2. 查依赖服务是否正常 3. 查网络是否有丢包"
    ],
    ids=["doc1", "doc2"]
)

tools = [{
    "type": "function",
    "function": {
        "name": "search_knowledge_base",
        "description": "查询内部知识库,获取故障排查相关的文档",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string", "description": "查询关键词"}},
            "required": ["query"]
        }
    }
}]

def agent_run(query):
    messages = [{"role": "user", "content": query}]
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    if response.choices[0].message.tool_calls:
        tool_call = response.choices[0].message.tool_calls[0]
        if tool_call.function.name == "search_knowledge_base":
            args = eval(tool_call.function.arguments)
            results = collection.query(query_texts=[args["query"]], n_results=1)
            knowledge = results["documents"][0][0]
            messages.append(response.choices[0].message)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": "search_knowledge_base",
                "content": knowledge
            })
    final_response = client.chat.completions.create(model="gpt-4o", messages=messages)
    return final_response.choices[0].message.content

print(agent_run("线上出现OOM故障了,我该怎么排查?"))

Replace the API key and load your own documents to obtain a personalized agent.

Final Thoughts

AI Agents are still early‑stage technology; hype about general AI is exaggerated. However, in vertical, repeatable workflows—such as incident triage, customer support, or code review—agents can dramatically boost efficiency. Start with a small, well‑defined use case, build a minimal prototype, and iterate from there.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory ManagementPythonLLMtool integrationAI AgentPlanning
Architect's Ambition
Written by

Architect's Ambition

Observations, practice, and musings of an architect. Here we discuss technical implementations and career development; dissect complex systems and build cognitive frameworks. Ambitious yet grounded. Changing the world with code, connecting like‑minded readers with words.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.