Leveraging Multimodal Large Language Models for Frontend Automated Testing (NL2Test)
This article explores how multimodal large language models (MM‑LLMs) combined with structured prompt engineering can transform frontend regression testing by enabling natural‑language‑driven test case generation, visual verification, and script self‑healing, thereby reducing maintenance costs and improving coverage across dynamic UI scenarios.
Regression testing is a core quality assurance step for Snowball's mobile apps, but traditional script‑based automation suffers from fragile selectors, high maintenance, and difficulty handling dynamic UI changes. The QA team proposes a new approach called NL2Test, which uses natural language prompts and multimodal large language models (MM‑LLMs) to generate executable test scripts and perform visual validation.
MM‑LLMs, such as Qwen2.5‑VL and Doubao‑1.5‑vision‑pro, combine text understanding with image analysis, enabling two key capabilities for frontend automation: (1) semantic navigation of user flows without hard‑coded element locators, and (2) visual comparison of UI screens against design baselines. By feeding a simple prompt like "Open the fund detail page for 'Guangfa Shanghai Gold ETF Connect A'", the model autonomously determines the required actions, interacts with the app, and validates the resulting page.
The solution architecture consists of three layers: a mobile agent model for navigation, a visual analysis model for verification, and a prompt‑engineering layer that structures user intent into hierarchical prompts. Prompt engineering involves extracting scenario information, defining roles and commands, and assembling them into reusable templates, which guide the models and prevent hallucinations.
Practical experiments across three business lines demonstrated successful script generation, dynamic UI adaptation, and visual diff analysis, with comparable performance among several vendors. The team also built a backend prompt‑management system to support large‑scale AI‑driven testing and plans to further integrate private model deployments and vector‑based knowledge bases.
In conclusion, integrating multimodal LLMs into frontend automation reduces the reliance on brittle code, improves test coverage, and opens a path toward fully AI‑assisted testing pipelines, while highlighting the need for multi‑model collaboration, localized engineering, and continuous prompt refinement.
Snowball Engineer Team
Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.