Artificial Intelligence 16 min read

Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright

browser-use, an open‑source MIT‑licensed LLM agent loop that compresses page DOM into an indexed list of interactive elements, lets large language models plan and execute web tasks, and is compared against Anthropic’s Computer Use, OpenAI’s Operator and traditional Playwright/Selenium, highlighting its flexibility, lower cost, but higher LLM usage and deployment trade‑offs.

Code Mala Tang

May 25, 2026

Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright

Last October browser-use was open‑sourced on GitHub and reached 95,449 stars in six months, averaging over 500 stars per day and more than 10 k forks.

Positioning: a browser that LLMs can understand

The slogan of browser-use is “Make websites accessible for AI agents”. The intent is not merely to let AI access a site, but to make the site understandable to an LLM. Traditional tools such as Playwright or Selenium return raw DOM trees – thousands of HTML nodes, nested <div> s and dynamic class names – which either explode the token budget or leave the model unsure which element to click.

browser-use solves this by compressing the DOM into a "interactive element index": it scans the page, extracts every clickable or input element (buttons, links, forms), assigns each a numeric ID (0, 1, 2 …) and packages the element’s text, type and position into a concise list that is fed to the LLM.

For example, instead of the LLM seeing the raw HTML

<button class="btn-primary mt-3 px-4">Submit</button>

it sees the token [5] Button: Submit. When the model replies “click element 5”, browser-use translates that into the Playwright call page.click().

This design is similar to Anthropic’s Computer Use, which uses screenshots and coordinates, but browser-use relies on DOM + index, making it cheaper, faster and independent of a visual model. A visual mode (vision‑mode) is also supported – screenshots processed by GPT‑4V or Claude 3.5 Sonnet – but the default DOM mode is preferred for cost reasons.

Core difference: not a Playwright wrapper but an Agent loop

Playwright is a deterministic browser‑automation library: you script explicit steps such as “open URL, click button, fill form”. browser-use, by contrast, implements an Agent loop. You give it a high‑level task (e.g., “buy groceries on Instacart”), and the LLM plans steps, observes the page, decides the next action, and handles unexpected situations (pop‑ups, captchas, load failures).

The loop consists of five stages:

Observe : extract the current page’s interactive element index.

Reason : the LLM decides the next action (click, type, scroll, wait) based on the task and current state.

Execute : invoke the Playwright API to perform the action.

Validate : check whether the page changed or the task completed.

Loop : return to observation.

The loop is not hard‑coded; the LLM can change strategy when a step fails, try alternative elements, or re‑observe after a navigation. Traditional Playwright scripts cannot adapt to page redesigns, element‑position changes, or dynamic loading without manual updates.

The trade‑off is LLM cost: a simple task like “query a GitHub star count” may require 5–10 LLM calls, each consuming several thousand tokens, costing a few cents to tens of cents with GPT‑4. To mitigate cost, browser-use ships its own fine‑tuned model, ChatBrowserUse, priced at roughly one‑tenth of GPT‑4 (input $0.20/M tokens, output $2.00/M tokens).

Comparison with other solutions

Anthropic Computer Use

Computer Use adds a virtual desktop to Claude 3.5 Sonnet, allowing screen capture, mouse movement and keyboard input. Its strengths are broad applicability (web, desktop apps, CLI, IDE) and high flexibility. Its drawbacks are high cost (5–10× token usage compared with browser-use), slower speed (2–5 s per action due to screenshot upload and visual inference versus ~0.5 s for DOM mode), and limited controllability – the LLM decides the exact click location, which cannot be overridden.

OpenAI Operator

Operator is a newly released browser Agent bundled with ChatGPT Plus, currently unavailable via API. It also uses a visual mode (screenshot + GPT‑4V). Advantages are seamless integration in ChatGPT and a user‑friendly experience without coding. Disadvantages are closed‑source, no API, and opaque pricing (a $20/month subscription with unknown per‑task costs).

Playwright / Selenium

Traditional automation tools offer deterministic scripts, low compute cost, and millisecond‑level speed. Their disadvantages are fragility to page changes, high maintenance overhead, and inability to handle unexpected UI elements such as pop‑ups or captchas.

Conclusion of the comparison: browser-use cannot fully replace Playwright for deterministic, high‑frequency, low‑cost tasks, but it excels in exploratory automation where page structure varies and natural‑language task descriptions are used.

Getting started: five lines of code

Installation (Python 3.11+): uvinit && uvaddbrowser-use && uvsync Minimal example:

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def main():
    browser = Browser()
    agent = Agent(
        task="Find the number of stars of the browser-use repo",
        llm=ChatBrowserUse(),
        browser=browser,
    )
    await agent.run()

asyncio.run(main())

This script launches a Chromium instance via Playwright, lets the LLM plan the steps, automatically opens GitHub, searches for the repository, extracts the star count and returns the result.

To switch to GPT‑4 or Claude, replace the llm argument with ChatGoogle(model='gemini-3-flash-preview') or ChatAnthropic(model='claude-sonnet-4-6') and set the corresponding API keys ( GOOGLE_API_KEY or ANTHROPIC_API_KEY).

Advanced: custom tools

browser-use allows adding custom tools that the Agent can invoke. For example, a tool that queries an internal database:

from browser_use import Tools

tools = Tools()

@tools.action(description='Query internal database for user info')
def query_db(user_id: str) -> dict:
    return {"name": "Alice", "email": "[email protected]"}

agent = Agent(
    task="Fill the form with user 12345's info",
    llm=ChatBrowserUse(),
    browser=browser,
    tools=tools,
)

The Agent decides when to call query_db and when to fill the form, enabling deep integration with existing systems.

Cloud vs. Self‑host

Two deployment modes exist:

Self‑host (open‑source) : runs on your own machine, free under the MIT license, but you must handle anti‑scraping, proxies and captchas yourself.

Cloud (managed) : runs on browser‑use’s cloud, includes built‑in anti‑scraping fingerprints, proxy rotation and captcha handling, billed per invocation (price not publicly disclosed, API key required).

The official recommendation is Cloud for most users because anti‑scraping is the biggest hurdle. However, for internal‑network systems, sensitive data, or cost‑sensitive workloads, Self‑host is preferable.

In the author’s own tests (Self‑host + ChatBrowserUse), a simple “GitHub star” query required 5–8 LLM calls and cost less than $0.01.

Pitfalls and limitations

1. Anti‑scraping blocks : Playwright’s Chromium reveals navigator.webdriver, causing many sites to block the bot. Solutions: use Cloud, add playwright‑stealth plugin, or load a real Chrome user‑data directory.

2. Captchas : browser-use does not solve captchas itself. Work‑arounds include Cloud’s built‑in solver, third‑party services (2Captcha, Anti‑Captcha), or using a logged‑in browser profile.

3. Unpredictable cost : The number of LLM calls varies with task complexity; a simple task may finish in 5 calls, a complex one may need 50, potentially costing several dollars with GPT‑4. Mitigations are using the cheaper ChatBrowserUse model, setting a max_steps limit, or falling back to deterministic Playwright scripts when they fail.

4. Not suited for high‑frequency workloads : Each task launches a browser, loads pages, performs reasoning and execution, taking 10–30 seconds. For dozens of requests per second, a traditional Playwright headless pool is required.

Author’s judgment

browser-use is not a silver bullet. It addresses “exploratory browser automation” where tasks are described in natural language, page structures change, and unexpected UI events must be handled. For deterministic, high‑frequency, cost‑sensitive automation, Playwright remains the better choice.

Conversely, for internal SaaS tools that let employees operate software via natural language, heterogeneous form processing, competitor‑monitoring on frequently changing pages, or rapid prototyping of automation ideas without writing Playwright scripts, browser-use is currently the best open‑source solution.

The project’s 95 k stars are not hype; the high commit frequency (over 10 commits per day) shows active development. Anyone working on AI agents, RPA, or any scenario that requires “LLM‑driven browser interaction” should keep an eye on browser-use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

browser automation Playwright LLM agents MIT license Anthropic Computer Use OpenAI Operator

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.