Operations 13 min read

How to Empower AI with Agent‑Browser: Full Command Guide & Real‑World Use Cases

This article introduces agent‑browser, a CLI tool that lets large‑language‑model agents control browsers, explains its 15+ command categories, demonstrates navigation, data extraction, smart waiting, screenshot annotation, authentication, multi‑tab sessions, network interception, batch execution, and shows three practical scenarios for testing, scraping, and front‑end debugging.

AI Software Product Manager
AI Software Product Manager
AI Software Product Manager
How to Empower AI with Agent‑Browser: Full Command Guide & Real‑World Use Cases

Overview

agent-browser (GitHub: https://github.com/vercel-labs/agent-browser) provides a CLI that lets AI agents control a Chromium browser. It parses pages into an Accessibility Tree, assigns stable element identifiers ( @eN), and exposes commands for navigation, interaction, data extraction, waiting, screenshot, session management, multi‑tab, network interception, and batch execution.

Key Concepts

Element identifiers – After snapshot, each interactive node receives an identifier such as @e1, @e2. Subsequent commands can refer to these IDs directly, avoiding CSS selectors.

Command Reference

Basic navigation & interaction

agent-browser open https://example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "[email protected]"
agent-browser press Enter
agent-browser screenshot page.png
agent-browser close

Information extraction

agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e3
agent-browser get title
agent-browser get url
agent-browser get attr @e1 href

Smart waiting

agent-browser wait "#loading"
agent-browser wait 2000
agent-browser wait --text "加载完成"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidle

Screenshot & annotation

agent-browser screenshot
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser pdf report.pdf

Cookie & authentication management

# List Chrome profiles
agent-browser profiles
# Reuse a profile (preserves login state)
agent-browser --profile Default open https://gmail.com
# Persistent session across restarts
agent-browser --session-name myapp open https://myapp.com
# Secure credential vault (encrypted, invisible to the agent)
agent-browser auth save mysite
agent-browser auth login mysite

Multi‑tab & multi‑session

# Open new tabs
agent-browser tab new https://docs.example.com
agent-browser tab new --label api https://api.example.com
agent-browser tab api   # switch by label
# Isolated sessions
agent-browser --session agent1 open https://site-a.com
agent-browser --session agent2 open https://site-b.com
agent-browser session list

Network interception & mocking

# Mock API response
agent-browser network route "*/api/user" --body '{"name":"test"}'
# Abort ad requests
agent-browser network route "*/ads/*" --abort
# List failing requests
agent-browser network requests --status 4xx
# Record HAR
agent-browser network har start

Batch execution

agent-browser batch \
"open https://example.com" \
"wait --load networkidle" \
"snapshot -i" \
"screenshot result.png"

Browser modes

Headless Chromium (default) – runs without UI, suitable for CI/CD and background automation.

Headed mode ( --headed) – opens a visible window for debugging or demos.

Remote cloud browsers – connect to services such as Browserless, Browserbase, or AWS AgentCore via -p flag, enabling massive concurrency without a local browser.

agent-browser -p browserless open https://example.com
agent-browser -p browserbase open https://example.com
agent-browser -p agentcore open https://example.com

Installation

npm i -g agent-browser
agent-browser install
npx skills add vercel-labs/agent-browser@agent-browser -g -y

Practical scenarios

Scenario 1 – Automated web‑app testing

Goal: verify the login flow of https://ruoyi.eleadmin.com/ with a single prompt.

Open the login page.

Take a snapshot and locate the username and password fields.

Fill in test credentials.

Click the login button.

Wait for navigation to the dashboard.

Capture a screenshot to confirm successful login.

Scenario 2 – Bulk data extraction

Task: scrape pricing tables from three Chinese LLM providers.

请使用 agent-browser 完成以下任务:
1. 打开 https://platform.minimaxi.com/docs/pricing/overview
2. 打开 https://open.bigmodel.cn/pricing
3. 打开 https://dashscope.console.aliyun.com/billing
对每个页面执行:
- 等待加载完成
- snapshot 并定位定价表格
- 提取模型名称、输入价格、输出价格等字段
- 如有多个模型版本,全部提取
- screenshot 保存为 pricing_厂商名_日期.png
最终汇总为 Markdown 表格并保存为 大模型定价对比.md

Scenario 3 – Front‑end development debugging

After building a feature, let the agent launch the local dev server, verify functionality, and collect performance metrics.

# Measure core web vitals
agent-browser vitals http://localhost:3000
# Enable React DevTools integration
agent-browser open --enable react-devtools http://localhost:3000
agent-browser react tree
agent-browser react inspect <fiberId>
Command LineAI integrationbrowser automationweb testingagent-browser
AI Software Product Manager
Written by

AI Software Product Manager

Daily updates of Xiaomi's latest AI internal materials

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.