Artificial Intelligence 24 min read

From Claude 3.5 Sonnet to Manus: The Evolution and Landscape of Computer‑Use AI Agents

This article surveys the rapid development of computer‑use AI agents—from Anthropic’s Claude 3.5 Sonnet and OpenAI’s Operator to the multi‑agent Manus platform—detailing their capabilities, benchmark results, open‑source alternatives, practical challenges, and future prospects for autonomous digital assistants.

DevOps
DevOps
DevOps
From Claude 3.5 Sonnet to Manus: The Evolution and Landscape of Computer‑Use AI Agents

In October 2024 Anthropic released Claude 3.5 Sonnet, introducing a public‑test "computer use" capability that lets the model view screens, move the cursor, click buttons, and type text, sparking early interest in AI that can directly operate a computer.

Despite the hype, the feature remained experimental, requiring developers to deploy their own instances and proving cumbersome for non‑technical users, which limited broader adoption.

OpenAI followed in January 2025 with Operator, powered by a Computer‑Using Agent (CUA) that combines GPT‑4o’s vision with advanced reasoning to interact with graphical user interfaces without specialized APIs, achieving high success rates on benchmarks such as OSWorld (38.1%), WebArena (58.1%) and WebVoyager (87%). However, access is restricted to $200/month Pro users.

In March 2025 the Monica.im team launched Manus, billed as the world’s first general‑purpose AI agent capable of planning, decomposing, and executing complex tasks across more than 40 domains, achieving 86.5% accuracy on the GAIA benchmark while costing only one‑tenth of competing solutions.

Numerous other products and open‑source projects have emerged, including Flowith, Google AI Studio, Midscene.js, Zhipu GLM‑PC, OpenInterpreter, OpenAdapt, OmniParser, E2B Desktop Sandbox, and many GitHub repositories that replicate or extend computer‑use functionality.

Academic papers accompanying this surge provide systematic surveys and benchmarks (e.g., "AI Agents for Computer Use", "OS Agents", "UFO", "OSWorld"), highlighting both progress and remaining gaps such as lower human‑level performance, privacy concerns, and high deployment costs.

The article concludes that while computer‑use agents are moving from experimental demos toward practical digital assistants, challenges in robustness, security, and accessibility must be addressed before they become ubiquitous in everyday workflows.

multimodal AIAutomationAI agentsbenchmarkOpenAIAnthropicComputer Use Agent
DevOps
Written by

DevOps

Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.