Artificial Intelligence 9 min read

How Multimodal Large Models Revolutionize UI Automation Testing

This article details how Alibaba's Ant Group leverages multimodal large‑language models and multi‑agent architectures to create a low‑code, AI‑driven UI automation testing framework that improves test coverage, reduces manual effort, and scales across diverse mobile mini‑program scenarios.

AntTech

Jul 2, 2025

How Multimodal Large Models Revolutionize UI Automation Testing

Introduction

Zhu Jiali from Alipay Technology presented at QECon 2025 on "UI Automation Testing Based on Multimodal Large Models," introducing a novel AI‑driven testing approach.

Problem Background

Mini‑program quality inspection is highly complex; manual evaluation is subjective, costly, and cannot fully cover the entire business flow, leading to significant resource consumption.

AI Automation Solution

An intelligent solution was developed that uses deep learning and multimodal large models to automatically detect UI pages and interaction flows, generate AI test cases, and maintain low overhead while ensuring functional stability and user experience.

TestFun Platform Features

Cross‑terminal support : Seamlessly integrates simulators, virtual machines, and real devices for consistent testing.

Multidimensional testing : Covers compatibility, performance, and other quality dimensions.

Out‑of‑the‑box usage : Provides account pool management and multi‑environment switching to simplify test preparation.

Closed‑loop management : Automates regression testing and full‑process quality management for continuous improvement.

Challenges in Existing Testing

Test case freshness is hard to maintain due to rapid iteration and platform fragmentation.

Business scenarios are complex with intricate interactions across multiple tech stacks.

High resource consumption: real‑device costs and large task volumes.

Low stability caused by device and network anomalies.

Methodology

1. Data‑Driven Approach

Large‑model training relies on massive, accurate UI data collected from the platform. The pipeline includes data filtering, model refinement, human verification, preprocessing, training, and business evaluation, ultimately producing new model weights that reduce manual effort.

2. Multi‑Agent Construction

Complex task flows are decomposed using a suite of agents:

Planning Agent : Breaks down complex intents into simple, single‑step intents.

Action Agent : Maps each simple intent to concrete actions and parameters.

Reflection Agent : Reviews and corrects erroneous actions.

Additional visual and textual tools enrich input and assist decision‑making.

3. Route RAG (Retrieval‑Augmented Generation)

Known paths are stored in a knowledge base; agents retrieve relevant routes and domain knowledge before making decisions, improving success rates for long interaction sequences.

Business Impact

AI‑generated test cases exceeded 12k, raising automation coverage from 50% to 70%.

Deployed for Alipay mini‑program review and daily quality inspection, reducing user complaints and labor costs.

Earned multiple awards and patents, including the 2024 AI Pioneer Case by the China AI Industry Alliance.

Published research such as "MobileFlow: A Multimodal LLM For Mobile GUI Agent" (NeurIPS 2024 workshop).

Future Outlook

Building on the current solution and advancing large‑model capabilities, the team aims to further enhance technical performance and user experience, extending the framework to more complex tasks and broader application domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

UI automation software quality multimodal LLM AI testing agent-based testing

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.