Webwright Lets Browser Agents Move Beyond Guessing the Next Click
Microsoft's newly open‑sourced Webwright framework replaces the traditional step‑by‑step LLM decision loop with code‑generated Playwright scripts, stores all state locally, achieves SOTA benchmark results on Online‑Mind2Web and Odysseys, integrates with major agent ecosystems, and offers auditability and reusable automation.
Core design: abandoning single‑step loops
Most browser agents follow a fixed pipeline—observe page state → predict next click or input → execute—invoking an LLM at every step. This works when LLMs are weak but becomes a bottleneck as code‑generation ability improves.
Webwright adopts a workflow that mirrors how engineers automate browsers:
Let the LLM directly generate runnable Playwright scripts, turning web actions into reusable Python programs.
Persist all artefacts—scripts, screenshots, logs—in a local workspace; the browser session can be started, inspected, or discarded at any time, rather than being the sole state carrier.
Maintain an ultra‑minimal architecture of three modules (~1,500 lines total): Runner (≈150 lines), Model Endpoint (≈550 lines), Environment (≈300 lines), depending only on httpx, pydantic, playwright, and typer.
The agent leaves behind a modifiable, shareable automation script instead of a one‑off execution trace.
Performance reaches SOTA level
Webwright achieved the best open‑source scores on two mainstream browser‑agent benchmarks under a 100‑step budget:
Online‑Mind2Web (300 real‑world tasks): GPT‑5.4 attained 86.7% accuracy, the highest among open‑source harnesses; Claude Opus 4.7 reached 84.7% and outperformed GPT‑5.4 on difficult cases (80.5% vs 76.6%).
Odysseys (200 long‑running tasks, average 76.1 steps): GPT‑5.4 achieved a 60.1% completion rate, improving the previous SOTA by 15.6 percentage points and surpassing the coordinate‑prediction baseline by 26.6 points.
A small model such as Qwen‑3.5‑9B, when paired with the provided tool scripts, reaches 66.2% completion on the hard cases of Online‑Mind2Web, enabling low‑cost deployment.
Ecosystem integration and extra features
Claude Code: install via the plugin market and use /webwright:run for one‑off tasks or /webwright:craft to generate reusable parameterised scripts.
OpenAI Codex: after installing the plugin, invoke the agent with @webwright.
OpenClaw, Hermes Agent: share the same skill directory and load directly.
Two additional utilities are provided:
Task2UI mode – automatically renders task results as an interactive HTML app, eliminating manual visualisation work.
Full auditability – every run’s trace, screenshots, and logs are stored locally for debugging and replay.
Key differences from similar projects
Paradigm : Webwright treats the browser as a disposable runtime and the agent as a code‑generation engine, unlike Stagehand’s mixed code + NL primitive or agent‑browser’s CLI‑only approach.
Action space : Users write free‑form Python Playwright scripts, whereas others rely on predefined command sets or index‑based clicks.
State carrier : Webwright persists code, screenshots, and logs in a local workspace; other tools keep state inside the browser session.
Loop shape : Webwright follows a write‑code → execute → screenshot → fix‑code cycle, contrasting with the observe‑predict‑act loop of traditional agents.
Industry consensus: agents must leave the single‑step trap
Developers note that most automation bottlenecks lie in the decision loop, not click speed; compressing this gap creates a fundamentally new category.
Lakshman Turlapati, author of Full Self Browsing (FSB), affirms that agents should expose the full browser session, DOM, screenshots, and recovery mechanisms in a single control layer, exactly what Webwright provides.
Other engineers describe Webwright as the first streamlined, official solution for “coding agents” that they previously cobbled together with Copilot CLI + Playwright MCP.
Quick start
Basic run
Requirements: Python 3.10+, Playwright‑installed Chromium, and an API key for OpenAI/Anthropic/OpenRouter.
# Install
pip install -e .
playwright install chromium
# Run example task
python -m webwright.run.cli \
-c base.yaml -c model_openai.yaml \
-t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
--start-url https://www.google.com/flights \
--task-id demo_openai \
-o outputs/defaultClaude Code plugin installation
# Add plugin market
/plugin marketplace add microsoft/Webwright
# Install plugin
/plugin install webwright@webwrightRelated links
Webwright GitHub repository: https://github.com/microsoft/webwright
Webwright official blog: https://www.microsoft.com/en-us/research/articles/webwright-a-terminal-is-all-you-need-for-web-agents/
FSB website: https://full-selfbrowsing.com/agents
FSB GitHub repository: https://github.com/lakshmanturlapati/FSB
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
