Artificial Intelligence 30 min read

From Chat to Autonomous Agents: Architecture, ReAct, Prompt Engineering

This article chronicles the evolution from simple chat interactions to sophisticated autonomous agents, detailing stages of LLM development, ReAct reasoning, memory management, tool integration, and practical implementation using the browser-use project, while offering prompt design insights and future directions for AI agents.

Alibaba Cloud Developer

Jun 11, 2025

From Chat to Autonomous Agents: Architecture, ReAct, Prompt Engineering

Background

This post is a personal learning summary of agents, aiming to understand how an agent product runs from an engineering perspective.

LLM Understanding Stages

Stage 1: Chat only

Simple text input-output interaction. Prompt engineering (COT, ReAct) improves reasoning; RAG mitigates hallucinations by retrieving external knowledge. Applications include emotional companions, role‑play, copy generation, and Copilot‑style assistance.

Stage 2: Workflow orchestration

Function calls give LLMs stable outputs and tool‑using capabilities. Low‑code platforms like Coze and Dify enable users to compose agents via workflows, reducing development cost.

Stage 3: Agent

Agents perceive and act in environments autonomously. Users describe goals, and the LLM uses tools and context to plan, execute, and generate code in sandboxed environments, greatly boosting productivity.

ReAct Framework

ReAct mirrors human problem solving: Thought → Action → Observation. The typical flow is illustrated below.

Agent Architecture (browser‑use)

Core Components

Agent Core : Coordinates components, manages task flow, and ensures correct communication.

MessageManager : Handles all LLM communication (system prompts, user messages, tool outputs).

Memory : Provides short‑term and long‑term memory, using caching or vector databases.

LLM Interface : Sends/receives messages to the language model.

Controller : Executes browser actions and registers tools.

BrowserContext : Manages browser sessions, DOM operations, and page state.

Execution Flow

The process follows ReAct: generate Thought, call a tool (Action), observe result, and iterate until the goal is reached.

sytem_prompt = {"previousGoal": "...", "memory": "...", "next_goal": "...", "actions": "..."}
tools = Tools()
context = Context(tools)
agent = Agent(sytem_prompt, context)
while not context.finished:
    status, actions = agent.run(context)
    tools.run(actions)

Memory Module

Memory consists of short‑term (recent conversation, tool info) and long‑term (vector store) components. The MessageManager records all messages, and the mem0 framework summarizes history to keep token usage low.

class MessageMetadata:
    token_count: int
    message_type: str

class ManagedMessage:
    content: str
    metadata: MessageMetadata

Prompt Design & Structured Output

The system prompt defines a strict JSON output schema, includes examples, and uses Pydantic for validation. Example schema:

{
  "current_state": {
    "evaluation_previous_goal": "...",
    "memory": "...",
    "next_goal": "..."
  },
  "action": [{"click_element": {"index": 0}}]
}

Three output handling modes are provided:

raw : Parse JSON from raw model output.

functionCall : Use OpenAI‑style function calls.

structured : LangChain’s structured output with Pydantic validation.

Error Handling

If validation fails, the agent captures the error, inserts a message with details, and retries, encouraging the model to correct its format.

if isinstance(error, ValidationError):
    return f"Invalid model output format. Details: {str(error)}"

Tool Registration & Invocation

Tools are registered in a registry with name, description, and parameter schema. During execution, the agent selects a tool based on the model’s output and calls it with assembled arguments.

@tool(name="click_element", description="Click a button", params=ClickParams)
def click_element(params):
    # implementation
    pass

MCP Integration

MCP (Model Context Protocol) standardizes tool exposure. Two integration approaches are discussed:

System‑prompt method: expose MCP tools via a local use_mcp tool.

function‑call method: wrap MCP tools as regular function‑call tools.

Integrating MCP into browser‑use would replace local tool registration with remote MCP services, allowing dynamic tool discovery.

Coze Space

Coze Space (launched April 19) offers three core capabilities: task automation, expert‑agent ecosystem, and MCP integration. It supports two agent modes:

Exploration mode : Interleaved planning and execution (similar to ReAct).

Planning mode : High‑level plan first, then execute sub‑tasks sequentially.

Both modes improve flexibility and user interaction.

Conclusion & Outlook

The article reflects on current agent designs, proposes future features such as self‑planning, hierarchical planning, rethink mechanisms, and better human‑AI interaction. It emphasizes the importance of stable output, distributed memory, multi‑model compatibility, and prompt security for sustainable agent deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM MCP Prompt Engineering React tool integration AI Agent Memory

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.