How to Empower AI with Agent‑Browser: Full Command Guide & Real‑World Use Cases
This article introduces agent‑browser, a CLI tool that lets large‑language‑model agents control browsers, explains its 15+ command categories, demonstrates navigation, data extraction, smart waiting, screenshot annotation, authentication, multi‑tab sessions, network interception, batch execution, and shows three practical scenarios for testing, scraping, and front‑end debugging.
Overview
agent-browser (GitHub: https://github.com/vercel-labs/agent-browser) provides a CLI that lets AI agents control a Chromium browser. It parses pages into an Accessibility Tree, assigns stable element identifiers ( @eN), and exposes commands for navigation, interaction, data extraction, waiting, screenshot, session management, multi‑tab, network interception, and batch execution.
Key Concepts
Element identifiers – After snapshot, each interactive node receives an identifier such as @e1, @e2. Subsequent commands can refer to these IDs directly, avoiding CSS selectors.
Command Reference
Basic navigation & interaction
agent-browser open https://example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "[email protected]"
agent-browser press Enter
agent-browser screenshot page.png
agent-browser closeInformation extraction
agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e3
agent-browser get title
agent-browser get url
agent-browser get attr @e1 hrefSmart waiting
agent-browser wait "#loading"
agent-browser wait 2000
agent-browser wait --text "加载完成"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidleScreenshot & annotation
agent-browser screenshot
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser pdf report.pdfCookie & authentication management
# List Chrome profiles
agent-browser profiles
# Reuse a profile (preserves login state)
agent-browser --profile Default open https://gmail.com
# Persistent session across restarts
agent-browser --session-name myapp open https://myapp.com
# Secure credential vault (encrypted, invisible to the agent)
agent-browser auth save mysite
agent-browser auth login mysiteMulti‑tab & multi‑session
# Open new tabs
agent-browser tab new https://docs.example.com
agent-browser tab new --label api https://api.example.com
agent-browser tab api # switch by label
# Isolated sessions
agent-browser --session agent1 open https://site-a.com
agent-browser --session agent2 open https://site-b.com
agent-browser session listNetwork interception & mocking
# Mock API response
agent-browser network route "*/api/user" --body '{"name":"test"}'
# Abort ad requests
agent-browser network route "*/ads/*" --abort
# List failing requests
agent-browser network requests --status 4xx
# Record HAR
agent-browser network har startBatch execution
agent-browser batch \
"open https://example.com" \
"wait --load networkidle" \
"snapshot -i" \
"screenshot result.png"Browser modes
Headless Chromium (default) – runs without UI, suitable for CI/CD and background automation.
Headed mode ( --headed) – opens a visible window for debugging or demos.
Remote cloud browsers – connect to services such as Browserless, Browserbase, or AWS AgentCore via -p flag, enabling massive concurrency without a local browser.
agent-browser -p browserless open https://example.com
agent-browser -p browserbase open https://example.com
agent-browser -p agentcore open https://example.comInstallation
npm i -g agent-browser
agent-browser install
npx skills add vercel-labs/agent-browser@agent-browser -g -yPractical scenarios
Scenario 1 – Automated web‑app testing
Goal: verify the login flow of https://ruoyi.eleadmin.com/ with a single prompt.
Open the login page.
Take a snapshot and locate the username and password fields.
Fill in test credentials.
Click the login button.
Wait for navigation to the dashboard.
Capture a screenshot to confirm successful login.
Scenario 2 – Bulk data extraction
Task: scrape pricing tables from three Chinese LLM providers.
请使用 agent-browser 完成以下任务:
1. 打开 https://platform.minimaxi.com/docs/pricing/overview
2. 打开 https://open.bigmodel.cn/pricing
3. 打开 https://dashscope.console.aliyun.com/billing
对每个页面执行:
- 等待加载完成
- snapshot 并定位定价表格
- 提取模型名称、输入价格、输出价格等字段
- 如有多个模型版本,全部提取
- screenshot 保存为 pricing_厂商名_日期.png
最终汇总为 Markdown 表格并保存为 大模型定价对比.mdScenario 3 – Front‑end development debugging
After building a feature, let the agent launch the local dev server, verify functionality, and collect performance metrics.
# Measure core web vitals
agent-browser vitals http://localhost:3000
# Enable React DevTools integration
agent-browser open --enable react-devtools http://localhost:3000
agent-browser react tree
agent-browser react inspect <fiberId>AI Software Product Manager
Daily updates of Xiaomi's latest AI internal materials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
