Exploring OpenManus: A Deep Dive into an Open‑Source AI Agent Framework
This article provides a comprehensive overview of OpenManus, an open‑source, general‑purpose AI agent framework, covering its installation, configuration, core architecture—including BaseAgent, ReActAgent, ToolCallAgent, and Manus—its extensive tool collection, execution logs, and detailed code analysis for developers and AI researchers.
Overview
OpenManus is an open‑source, general‑purpose AI agent framework that enables complex task automation through a modular architecture of agents, tools, and the Model Context Protocol (MCP). It supports multiple large language models (LLMs), asynchronous execution, and a rich set of built‑in tools such as browser automation, Python code execution, and file editing.
Installation & Environment
Typical installation steps on macOS are:
$ curl -LsSf https://astral.sh/uv/install.sh | sh
$ source $HOME/.local/bin/env
$ git clone https://github.com/FoundationAgents/OpenManus.git
$ cd OpenManus
$ uv venv --python 3.12
$ source .venv/bin/activate
$ uv pip install -r requirements.txt
$ npm install -g playwright
$ playwright installEnvironment details include macOS, IDE Kiro, and the PPIO DeepSeek‑v3.2‑exp LLM.
Configuration
The framework uses a config.toml file to define LLM endpoints, model parameters, and API keys. Example snippets:
[llm]
api_type = 'ppio'
model = "deepseek/deepseek-v3.2-exp"
base_url = "https://api.ppinfra.com/v3/openai"
api_key = "YOUR_API_KEY"
max_tokens = 16000
temperature = 0.0
[llm.vision]
model = "qwen/qwen2.5-vl-72b-instruct"
base_url = "https://api.ppinfra.com/v3/openai"
max_tokens = 96000
temperature = 0.0
[daytona]
daytona_api_key = "YOUR_DAYTONA_KEY"
daytona_server_url = "https://app.daytona.io/api"
daytona_target = "us"Core Architecture
The agent hierarchy is built on four main classes:
BaseAgent : Provides the execution loop, state management, memory handling, and LLM integration.
ReActAgent : Extends BaseAgent with the ReAct reasoning‑action cycle ( think → act).
ToolCallAgent : Adds OpenAI Function Calling support, parses tool calls, and executes them via a ToolCollection.
Manus : The concrete general‑purpose agent that bundles a rich toolset, custom system prompts, and MCP support.
Key attributes of BaseAgent include system_prompt, next_step_prompt, llm, memory, state, max_steps, and duplicate‑response detection logic.
Example of the BaseAgent.run loop (simplified):
async def run(self, request: Optional[str] = None) -> str:
if self.state != AgentState.IDLE:
raise RuntimeError("Cannot run agent from state: {self.state}")
if request:
self.update_memory('user', request)
async with self.state_context(AgentState.RUNNING):
while self.current_step < self.max_steps and self.state != AgentState.FINISHED:
self.current_step += 1
step_result = await self.step()
if self.is_stuck():
self.handle_stuck_state()
return "
".join(results) or "No steps executed"ReActAgent
Implements the ReAct pattern where each step first think (LLM decides whether to act) and then act (executes the chosen tool). The abstract methods are overridden by subclasses such as ToolCallAgent.
ToolCallAgent
Uses OpenAI Function Calling to select tools. It maintains a ToolCollection that maps tool names to concrete implementations derived from BaseTool. The think method sends the conversation history plus the list of available tools to the LLM, parses the response, and stores any tool calls. The act method iterates over self.tool_calls, executes each tool via self.available_tools.execute, records the result in memory, and returns a combined output.
Manus
Specializes ToolCallAgent for a general AI assistant. It defines a detailed system prompt, a richer next_step_prompt, and a default tool set that includes: PythonExecute – run arbitrary Python code safely. BrowserUseTool – full‑featured browser automation (navigation, clicking, scrolling, content extraction). StrReplaceEditor – create, view, and edit files on the host. AskHuman – optional human‑in‑the‑loop interaction. Terminate – clean shutdown of the agent.
Manus also integrates MCP servers for inter‑process communication and provides a BrowserContextHelper that dynamically adjusts prompts when a browser session is active.
Tool System
All tools inherit from BaseTool, which defines name, description, parameters, and an abstract execute method. The framework converts each tool into the OpenAI Function Calling schema via BaseTool.to_param(). Example of the browser automation tool definition (truncated):
class BrowserUseTool(BaseTool, Generic[Context]):
name = "browser_use"
description = "A powerful browser automation tool ..."
parameters = {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["go_to_url", "click_element", "input_text", "scroll_down", "web_search", "extract_content", ...]},
"url": {"type": "string"},
"index": {"type": "integer"},
"text": {"type": "string"},
"scroll_amount": {"type": "integer"},
"query": {"type": "string"},
"goal": {"type": "string"},
"seconds": {"type": "integer"}
},
"required": ["action"],
"dependencies": {
"go_to_url": ["url"],
"click_element": ["index"],
"input_text": ["index", "text"],
"scroll_down": ["scroll_amount"],
"web_search": ["query"],
"extract_content": ["goal"]
}
}
async def execute(self, **kwargs) -> ToolResult:
# Implementation handles browser launch, action dispatch, and result collection.
...The ToolCollection class registers tools, allows dynamic addition, and provides a unified execute method that returns a ToolResult object containing output, error, and optional base64_image.
Execution Logs
The article includes a full log of a sample run where the agent creates a comprehensive Beijing travel plan. Highlights include:
Initial planning steps, tool selection, and repeated attempts to perform web searches.
Rate‑limit errors from the OpenAI API and subsequent handling.
Successful creation of beijing_travel_plan.md with a detailed 5‑day itinerary, accommodation suggestions, transportation guide, food recommendations, budget estimates, and practical tips.
Final interaction where the agent asks the user if any adjustments are needed before terminating.
Key Takeaways
The ReAct + Function Calling design enables the agent to reason about when to use tools and to recover from failures through retry and alternative strategies.
Modular tool definitions make it straightforward to extend the framework with new capabilities (e.g., additional search engines, data analysis modules).
Memory management via a Message list preserves conversation context, supporting multi‑turn interactions and tool result tracking.
Built‑in safeguards such as duplicate‑response detection and token‑limit handling improve robustness.
Conclusion
OpenManus demonstrates a practical implementation of modern AI agent concepts—ReAct reasoning, OpenAI Function Calling, and a plug‑in tool architecture—within a clean Python codebase. Developers can use it as a foundation for building custom AI assistants, autonomous workflows, or research prototypes that require tight integration between LLMs and external tools.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
