Unlocking AI Agents: Theory, Design Patterns, and Hands‑On Experiments

This article combines theoretical analysis and practical case studies to systematically explore the core components, design patterns, and future directions of AI agents, detailing the implementation of OpenManus, custom memory and planning modules, experimental evaluations, and insights for improving agent reliability and scalability.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Unlocking AI Agents: Theory, Design Patterns, and Hands‑On Experiments

Background

The Manus project was promoted as the world’s first truly general AI Agent, whose core capability is autonomous task decomposition and execution based on large language models (LLMs). Although it faced criticism for lacking low‑level innovation, its engineering integration of tools demonstrated high practical value.

Core Elements of AI Agents

Most research groups divide an AI Agent into four essential modules:

Perception

Memory (short‑term and long‑term)

Planning

Action (tool usage)

Additional optional modules include role definition, learning, and higher‑level cognition.

Memory Module Example

class Memory(BaseModel):
    messages: List[Message] = Field(default_factory=list)
    max_messages: int = Field(default=100)

A concrete memory instance records each interaction so the LLM can be reminded of prior steps.

Tool Integration

Agents can call external tools such as PythonExecute , RemoteJupyterClient , FileSaver , BrowserUseTool , and GoogleSearch . Each tool is described in a JSON‑like schema that the LLM uses to generate correct function calls.

Planning Prompt

You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
1. Analyze the request.
2. Create a clear, actionable plan.
3. Execute steps using available tools.
4. Track progress and adapt.
5. Use `finish` when the task is complete.

ReAct Agent Design

class ReActAgent(BaseModel, ABC):
    name: str = Field(...)
    description: Optional[str] = Field(None)
    system_prompt: Optional[str] = Field(None)
    next_step_prompt: Optional[str] = Field(None)
    llm: LLM = Field(default_factory=LLM)
    memory: Memory = Field(default_factory=Memory)
    async def think(self) -> bool:
        """Decide whether to act"""
    async def act(self) -> str:
        """Execute the chosen tool"""
    async def step(self) -> str:
        if not await self.think():
            return "Thinking complete"
        return await self.act()

Experiments

Experiment 1 – QuickSort without planning: The agent generated code quickly and verified it with a Jupyter client, but produced no intermediate explanations.

Experiment 2 – QuickSort with planning: Planning split the task into nine steps, using Google Search, Python execution, and file saving; the process took six minutes and yielded detailed logs and performance reports.

Experiment 3 – Sales query on a database (no plan): The LLM autonomously planned, inspected the schema, and returned the correct answer.

Experiment 4 – Weather forecast with planning: The agent attempted to fetch structured data, failed, invoked a termination tool, then hallucinated data, installed missing packages via pip, and finally produced a plot, illustrating both tool reliance and hallucination risks.

Findings

Pure ReAct agents are fast but heavily depend on LLM correctness.

Plan‑and‑Solve improves explainability but can generate overly long or inaccurate plans.

Tool‑driven tasks may suffer from hallucinations; robust fallback logic is needed.

Long context windows limit complex multi‑step reasoning; memory summarization becomes essential.

Improvements

Self‑criticism: Incorporate reflection steps to detect and correct plan failures.

Code as a tool: Treat generated code as executable actions, enabling agents to create new capabilities on the fly.

Autonomous evolution: Use memory summarization, RAG, and continual learning (e.g., Mem0, MemGPT) to let agents improve from experience.

MCP (Model Context Protocol): A standard for exposing tools to LLMs, offering service reuse, one‑stop platform building, ecosystem integration, and security sandboxing, though its long‑term adoption is still debated.

Future Directions

Dynamic plan verification (e.g., PlanGEN) to reduce hallucinations.

Scalable multi‑agent collaboration while managing coordination overhead.

Better memory abstraction to keep context windows manageable.

Conclusion

Translating AI‑agent theory into concrete code reveals both the power and the pitfalls of current LLM‑driven systems. Systematic experimentation, prompt engineering, and robust tool protocols are key to advancing reliable, autonomous agents.

References

Tool Learning with Large Language Models: A Survey (arXiv:2405.17935)

ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629)

Plan‑and‑Solve Prompting (arXiv:2305.04091)

ChatDev: Communicative Agents for Software Development (arXiv:2307.07924)

Scaling Large‑Language‑Model‑based Multi‑Agent Collaboration (arXiv:2406.07155)

PlanGEN: A Multi‑Agent Framework for Planning and Reasoning (arXiv:2502.16111)

Lost in the Middle: How Language Models Use Long Contexts (Stanford PDF)

AI Agent redefining software services
AI Agent redefining software services
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMOpenManusReactTool integrationAI AgentMemoryPlanning
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.