How AI Agents Seamlessly Control Real Browsers with Proxies and Parallel Sessions
BrowserAct is an open‑source platform that lets AI agents operate real, stealth browsers, preserving login states, handling dynamic content, managing multiple isolated sessions, supporting proxy configurations, and enabling human‑in‑the‑loop assistance through remote‑assist, turning fragile web automation into a robust workflow.
Limitations of existing browser automation
Tools that only fetch static HTML such as curl or web_fetch cannot handle JavaScript‑rendered pages, login states, or anti‑bot protections. Browser automation frameworks like Playwright, Puppeteer, and WebDriver launch real browsers but leave detectable automation footprints that trigger site‑level risk controls. The community project agent-browser provides basic navigation but lacks anti‑detection, stealth browsing, and human‑assist capabilities.
BrowserAct design
BrowserAct introduces a real‑browser execution layer for AI agents, separating the workflow into three layers to ensure reliable progress on dynamic sites.
Three‑layer architecture
Environment layer : Supplies each task with an isolated browser instance that has its own cookie store, profile, and network exit, preventing cross‑task contamination.
Automation layer : Automatically handles common obstacles such as dynamic loading, pop‑ups, and form validation, allowing the agent to continue without constant supervision.
Human‑assist layer : When a step requires manual input (e.g., SMS codes, SSO, hardware keys), the agent pauses and emits a remote‑assist command that generates a URL. The user completes the action on any device, closes the page, and the agent resumes from the exact browser state.
Example command:
browser-act --session my-task remote-assist --objective "完成短信验证码"The command returns a URL; opening the URL on a phone shows the live browser view, lets the user enter the verification code, and then the agent continues without losing context.
Parallel multi‑session support
BrowserAct decouples browser identity from task execution. An agent can run several independent sessions concurrently, each with its own stealth browser, proxy configuration, and login state.
一个Agent开着三个stealth浏览器:浏览器A绑定店铺A账号和静态代理,Session导出订单报表;浏览器B绑定店铺B账号和静态代理,Session检查客服消息;浏览器C使用动态代理临时巡检竞品价格。三个浏览器、三种身份、三个任务,一个Agent全管。
Proxy and stealth modes
Dynamic proxy + stealth (privacy) mode : IP rotates per request; suitable for large‑scale, short‑lived data extraction where no long‑term identity is retained.
Static proxy + stealth (standard) mode : Fixed IP and persistent browser profile; suitable for long‑running tasks that require stable login sessions.
Skill Forge – reusable workflow packaging
Skill Forge analyzes a completed browser flow, distinguishes API‑callable steps from DOM interactions, and packages the process into a portable Skill file.
让Skill Forge分析Amazon产品页的提取流程,生成 amazon-product-api-skill 。以后Agent直接调用该Skill,走已验证的稳定路径,无需重新适配页面布局。
The repository also includes a Solutions Catalog with over 30 pre‑built Skills for platforms such as Amazon, Taobao, Google Maps, YouTube, Instagram, TikTok, Reddit, WeChat, Zhihu, and Xiaohongshu.
Capability comparison
Stealth browser (local) : BrowserAct supports; agent‑browser does not.
Automatic anti‑detection handling : BrowserAct supports; agent‑browser does not.
General‑purpose stealth‑extract API : BrowserAct supports; agent‑browser does not.
Dynamic proxy (local) : BrowserAct supports; agent‑browser does not.
Installation and quick start
Installation does not require manual Playwright setup or Chrome path configuration. The simplest method is to let an agent install BrowserAct:
Install browser-act. Skill source: https://github.com/browser-act/skills/tree/main/browser-act
After installation, retrieve the list of available skills:
browser-act get-skills core --skill-version 2.0.2Verify that an agent can open a real page:
browser-act --session my-first-task browse --url "https://www.google.com"For a persistent workflow, create a named stealth browser and run tasks under it:
browser-act browser create --type stealth --name "zhihu-research-demo" --desc "BrowserAct demo browser for recurring Zhihu research workflow" browser-act --session zhihu-research browse --url "https://www.zhihu.com"The model separates responsibilities:
browser handles identity (cookies, profile, proxy, fingerprint).
session handles the specific task (current page, steps, extracted data).
remote‑assist handles human hand‑off for verification steps.
Typical use case
When gathering trending topics across multiple platforms, a plain Codex agent stalls on login‑protected sites, consuming tokens without progress. Using BrowserAct, the agent first attempts unauthenticated access, then triggers a remote‑assist link for the user to solve the captcha. After verification, the agent quickly retrieves the desired data, avoiding token waste and ensuring continuity.
Core components
BrowserAct’s value derives from five robust components that together enable an agent to operate on the live internet:
Stable entry into real webpages.
Human‑assist handoff for blocked steps.
Parallel session isolation.
Identity management via stealth browsers and proxy modes.
Reusable workflow packaging through Skill Forge.
GitHub repository:
https://github.com/browser-act/skills/tree/main
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
