Can One Human Click Enable Permanent Agent Reuse? BrowserBC’s One‑Shot Skill Extraction

BrowserBC records a human’s complete web task, rewrites the noisy trace into a natural‑language skill card, and lets a smaller model repeatedly execute the same class of tasks, achieving large success‑rate gains on WebArena‑Hard and ClawBench benchmarks.

Machine Heart
Machine Heart
Machine Heart
Can One Human Click Enable Permanent Agent Reuse? BrowserBC’s One‑Shot Skill Extraction

Modern web agents such as Claude or Codex can see pages, click buttons and fill forms, but each new task or website forces the most capable (and expensive) model to explore the workflow from scratch, often leading to loops, drift from the original intent, or premature termination.

The core question posed is whether a single human demonstration can be distilled into a reusable artifact that a cheaper agent can follow many times. BrowserBC, an open‑source project from Einsia AI’s Navers Lab, answers with a three‑step paradigm: Record → Transcribe → Execute .

Record : While a human completes a task in the browser, the system captures the full trajectory – task instructions, page observations (screenshots and DOM/accessibility snapshots), every user action (click, input, navigation) with element locators, page feedback, and the final task state.

Transcribe : Instead of storing a raw replay script, a model converts the cleaned trajectory into a natural‑language Skill card that explicitly states the intent, key steps, completion criteria, and pitfalls, while stripping non‑transferable details such as exact coordinates, volatile DOM selectors, login tokens, or private text.

Execute : Any downstream model reads the Skill and performs the task on the live page, grounding the high‑level instructions in the current page context rather than blindly reproducing recorded clicks.

In a concrete case, the team recorded a common travel‑booking workflow (enter dates, location, guests, apply filters, sort results, and select the best listing). The Skill card captured the intent (find the optimal accommodation), the critical steps (fill basic info, apply filters, verify success), and the failure modes (filter mismatch, missing fields). When handed to a smaller model, the agent immediately knew what to input and how to verify success, avoiding the trial‑and‑error that plagued the baseline.

To keep the growing Skill library manageable, BrowserBC organizes Skills into a Skill Graph . New candidates are either added as fresh nodes, merged with compatible existing Skills (matching intent, preconditions, steps, and evidence), or specialized under a more general node. This graph enables scalable reuse, localized updates, and incremental refinement when new trajectories arrive.

Experiments on two benchmarks demonstrate the impact:

WebArena‑Hard : 258 human‑verified tasks across six site categories. Base agent success = 60.5 % (156/258). With retrieved Skills, success rises to 81.4 % (210/258), a 20.9‑point gain, and average tool calls drop from 31.2 to 22.7 (‑27.3 %).

ClawBench : 152 real‑world tasks with layout changes. Baseline solves 32.9 % (50/152); Skill‑augmented solves 68.4 % (104/152), a 35.5‑point improvement, nearly doubling solved tasks across all eight categories.

Further analysis shows that forcing the agent to copy the Skill verbatim reduces success (77.5 % vs. 81.4 %) and can even hurt performance on ~4 % of tasks, highlighting that Skills act as a confidence‑weighted prior rather than a strict command.

Cross‑model transfer experiments reveal that high‑quality Skills distilled by a strong model (e.g., Sonnet‑4.6) benefit both large and smaller execution models, confirming the “distill once, reuse cheaply” premise. Conversely, lower‑quality Skills from a weaker model (Qwen‑3.7) provide minimal gains.

Failure case audits indicate that remaining errors stem from execution precision (missing form fields, ambiguous targets, budget overruns) rather than missing knowledge, underscoring that the bottleneck lies in the execution model’s grounding ability.

Extending beyond browsers, a study on 30 OSWorld‑style Ubuntu desktop tasks shows that Skills improve 17 of them, proving that the method transfers when the missing piece is procedural knowledge rather than low‑level GUI grounding.

Overall, BrowserBC demonstrates that the true value for web agents is not merely reproducing clicks but accumulating, structuring, and reusing human‑derived procedural knowledge, turning noisy browsing traces into durable, model‑agnostic priors that push agents from “usable” toward “efficient.”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Large Language Modelsopen sourceEvaluationBehavior Cloningbrowser automationSkill ExtractionWeb Agents
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.