Tag

Midscene.js

0 views collected around this technical thread.

ByteDance Web Infra
ByteDance Web Infra
Apr 7, 2025 · Artificial Intelligence

Midscene.js: AI‑Powered UI Automation Framework with Instant Actions and Deep Think

Midscene.js, an AI‑driven UI automation framework from the Web Infra team, introduces Instant Actions for stable interactions and Deep Think for precise element locating, providing developers with direct UI operation APIs and enhanced reliability across vision‑capable language models.

AIDeep ThinkInstant Actions
0 likes · 5 min read
Midscene.js: AI‑Powered UI Automation Framework with Instant Actions and Deep Think
ByteDance Web Infra
ByteDance Web Infra
Mar 21, 2025 · Artificial Intelligence

Midscene.js: An AI‑Driven UI Automation Framework from ByteDance

Midscene.js is an open‑source UI automation framework that leverages multimodal AI to simplify web UI testing and interaction, offering three core interfaces—Action, Query, and Assert—along with a JavaScript SDK, support for multiple AI models, YAML scripting, and future‑focused features for stable, scalable automation.

AIJavaScriptMidscene.js
0 likes · 21 min read
Midscene.js: An AI‑Driven UI Automation Framework from ByteDance
ByteDance Web Infra
ByteDance Web Infra
Feb 25, 2025 · Artificial Intelligence

Midscene.js Integrates Qwen‑2.5‑VL Model: Cost‑Effective, High‑Resolution UI Automation

Midscene.js v0.12 adds support for the Qwen‑2.5‑VL model, delivering GPT‑4o‑level accuracy while cutting token usage and cost by up to 80%, enabling interaction with canvas and iframe elements, offering high‑resolution input, and providing easy configuration through environment variables and a browser plugin.

Artificial IntelligenceCost ReductionFrontend Testing
0 likes · 10 min read
Midscene.js Integrates Qwen‑2.5‑VL Model: Cost‑Effective, High‑Resolution UI Automation
ByteDance Web Infra
ByteDance Web Infra
Jan 22, 2025 · Artificial Intelligence

Introducing UI‑TARS: A Native GUI Agent Model Integrated with Midscene.js for Multimodal UI Automation

The article presents UI‑TARS, a native GUI‑agent model that combines multimodal large‑language models with the open‑source Midscene.js framework to enable more accurate, token‑efficient, and privacy‑preserving UI automation, while discussing its architecture, advantages, limitations, and integration steps.

GUI AgentLarge Language ModelsMidscene.js
0 likes · 11 min read
Introducing UI‑TARS: A Native GUI Agent Model Integrated with Midscene.js for Multimodal UI Automation
ByteDance Web Infra
ByteDance Web Infra
Dec 17, 2024 · Frontend Development

Midscene.js: Multimodal AI‑Powered UI Automation for Web Frontend Testing

Midscene.js, an open‑source UI automation framework from ByteDance Web Infra, leverages multimodal AI to simplify writing, maintaining, and debugging web UI tests with JavaScript or YAML integrations, while discussing its origins, usage patterns, limitations, cost, and security considerations.

JavaScriptMidscene.jsPlaywright
0 likes · 11 min read
Midscene.js: Multimodal AI‑Powered UI Automation for Web Frontend Testing