Tagged articles

GUI Agent

11 articles · Page 1 of 1

Jun 18, 2026 · Artificial Intelligence

How Trip.com Cut Multilingual UI QA Costs by 90% with GUI Agent and Multi‑Agent AI

Trip.com built the "慧鉴天工" system that combines a GUI Agent, multi‑agent LQA algorithms, OODA‑loop architecture, and a knowledge‑graph‑enhanced pipeline to automate page collection, multilingual text extraction, and quality inspection across 31 languages, achieving over 90% cost reduction and 70%+ detection accuracy.

GUI AgentKnowledge GraphLarge Language Model

0 likes · 21 min read

How Trip.com Cut Multilingual UI QA Costs by 90% with GUI Agent and Multi‑Agent AI

Machine Learning Algorithms & Natural Language Processing

May 28, 2026 · Artificial Intelligence

A New Paradigm for GUI Agent Trajectory Generation: FSM‑Synthesized Data at $0.04 per Trajectory

AutoWebWorld introduces a finite‑state‑machine‑driven pipeline that synthesizes verified web‑GUI trajectories at an average cost of only $0.04 each, producing longer interaction sequences, scaling efficiently, and demonstrably improving large‑language‑model agents on WebVoyager and grounding benchmarks.

AutoWebWorldData GenerationFinite State Machine

0 likes · 13 min read

A New Paradigm for GUI Agent Trajectory Generation: FSM‑Synthesized Data at $0.04 per Trajectory

Xiaomi Tech

May 14, 2026 · Artificial Intelligence

500 M Videos Yield the Largest Open‑Source GUI Dataset; 3B Model Cuts Inference Tokens 71% and Beats Larger Models (Xiaomi AI at ICML 2026)

Xiaomi’s AI team extracted 5 billion video frames to create the world’s largest open‑source GUI dataset, demonstrated that a 3 B‑parameter model can reduce inference tokens by 71% while surpassing larger models, and presented a suite of ICML 2026 papers covering data scaling, benchmarking, reasoning, multimodal perception, and training stability for GUI agents and other AI tasks.

BenchmarkingGUI AgentLarge Language Model

0 likes · 21 min read

500 M Videos Yield the Largest Open‑Source GUI Dataset; 3B Model Cuts Inference Tokens 71% and Beats Larger Models (Xiaomi AI at ICML 2026)

Xiaohongshu Tech REDtech

May 12, 2026 · Artificial Intelligence

Treating Automated Testing as AI Coding: Xiaohongshu GUI Agent Real‑World Review

During the 2026 Spring Festival promotion, Xiaohongshu replaced manual UI testing with a three‑layer AI‑driven GUI Agent that executed over 43,000 runs across 106 devices and 128 scenarios, achieving 58% automation, 82% AI‑generated case adoption, 68% bug recall, 98% stability and roughly $1 per test case while drastically cutting token costs.

AI codingCode-as-ActionGUI Agent

0 likes · 23 min read

Treating Automated Testing as AI Coding: Xiaohongshu GUI Agent Real‑World Review

Machine Heart

Apr 13, 2026 · Artificial Intelligence

Mano‑P 1.0: The First GUI Agent to Top 13 Benchmarks and Move from Claw to Hand

Mano‑P 1.0 is a pure‑vision GUI agent that runs locally on Apple M4 devices, achieves SOTA on 13 multimodal benchmarks, offers zero‑cloud data handling, and introduces a three‑stage open‑source roadmap that reshapes personalized AI and end‑to‑end GUI automation.

GUI AgentMano-PPersonalized AI

0 likes · 17 min read

Mano‑P 1.0: The First GUI Agent to Top 13 Benchmarks and Move from Claw to Hand

AI Engineering

Apr 1, 2026 · Artificial Intelligence

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

H Company’s new Holo3 series delivers a visual language model that outperforms GPT‑5.4 on the OSWorld‑Verified benchmark with a 78.85% score while costing only about one‑tenth as much, offering both a flagship API‑only version and an open‑source lightweight variant optimized for GUI agents.

AI benchmarkGUI AgentHolo3

0 likes · 4 min read

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

DataFunSummit

Dec 23, 2025 · Artificial Intelligence

What Core Capabilities Do Mature GUI Agents Need? Expert Insights from the Agentic AI Summit

In a live discussion hosted by Prof. Yang Jian with experts Zhang Xi and Cui Chen, the panel explores the essential abilities of mature GUI agents, the role of multimodal models in visual understanding, the transfer of code‑agent techniques to GUI tasks, edge‑device performance trade‑offs, complex planning, tool ecosystems, deployment challenges, and future breakthrough scenarios.

Agentic AIGUI AgentMultimodal AI

0 likes · 22 min read

What Core Capabilities Do Mature GUI Agents Need? Expert Insights from the Agentic AI Summit

AntTech

Aug 19, 2025 · Artificial Intelligence

How UI‑Venus Achieves SOTA in Multimodal GUI Agent Benchmarks

Ant Group's open‑source native GUI agent UI‑Venus leverages multimodal large‑model and reinforcement‑learning techniques to outperform prior models on grounding and navigation benchmarks, while using a high‑quality data pipeline and a self‑evolving alignment mechanism to push the limits of GUI automation.

GUI AgentMultimodal AISOTA

0 likes · 7 min read

How UI‑Venus Achieves SOTA in Multimodal GUI Agent Benchmarks

Volcano Engine Developer Services

Jul 8, 2025 · Artificial Intelligence

Unlocking Autonomous GUI Agents: Inside UI‑TARS Multimodal Vision Model

This article introduces UI‑TARS, a multimodal visual model combined with the Model Context Protocol (MCP) to build next‑generation cross‑platform autonomous GUI agents, detailing its architecture, workflow, code examples, incremental inference, applications, challenges, and future research directions.

AIAutomationGUI Agent

0 likes · 20 min read

Unlocking Autonomous GUI Agents: Inside UI‑TARS Multimodal Vision Model

ByteDance Web Infra

Jan 22, 2025 · Artificial Intelligence

Introducing UI‑TARS: A Native GUI Agent Model Integrated with Midscene.js for Multimodal UI Automation

The article presents UI‑TARS, a native GUI‑agent model that combines multimodal large‑language models with the open‑source Midscene.js framework to enable more accurate, token‑efficient, and privacy‑preserving UI automation, while discussing its architecture, advantages, limitations, and integration steps.

GUI AgentMidscene.jsMultimodal AI

0 likes · 11 min read

Introducing UI‑TARS: A Native GUI Agent Model Integrated with Midscene.js for Multimodal UI Automation

AI Large Model Application Practice

Dec 9, 2024 · Artificial Intelligence

How GUI Agents Use Large Models to Automate Any Desktop Task

This article explains why GUI agents are needed, defines their multimodal capabilities, walks through a high‑level automation scenario, details the architecture of large‑model‑driven GUI agents, highlights recent open‑source projects, and compares them with traditional RPA solutions.

AI AutomationGUI AgentHuman-Computer Interaction

0 likes · 10 min read

How GUI Agents Use Large Models to Automate Any Desktop Task