Frontend Development 17 min read

Leveraging Multimodal Large Language Models for Frontend Automated Testing (NL2Test)

This article explores how multimodal large language models (MM‑LLMs) combined with structured prompt engineering can transform frontend regression testing by enabling natural‑language‑driven test case generation, visual verification, and script self‑healing, thereby reducing maintenance costs and improving coverage across dynamic UI scenarios.

Snowball Engineer Team
Snowball Engineer Team
Snowball Engineer Team
Leveraging Multimodal Large Language Models for Frontend Automated Testing (NL2Test)

Regression testing is a core quality assurance step for Snowball's mobile apps, but traditional script‑based automation suffers from fragile selectors, high maintenance, and difficulty handling dynamic UI changes. The QA team proposes a new approach called NL2Test, which uses natural language prompts and multimodal large language models (MM‑LLMs) to generate executable test scripts and perform visual validation.

MM‑LLMs, such as Qwen2.5‑VL and Doubao‑1.5‑vision‑pro, combine text understanding with image analysis, enabling two key capabilities for frontend automation: (1) semantic navigation of user flows without hard‑coded element locators, and (2) visual comparison of UI screens against design baselines. By feeding a simple prompt like "Open the fund detail page for 'Guangfa Shanghai Gold ETF Connect A'", the model autonomously determines the required actions, interacts with the app, and validates the resulting page.

The solution architecture consists of three layers: a mobile agent model for navigation, a visual analysis model for verification, and a prompt‑engineering layer that structures user intent into hierarchical prompts. Prompt engineering involves extracting scenario information, defining roles and commands, and assembling them into reusable templates, which guide the models and prevent hallucinations.

Practical experiments across three business lines demonstrated successful script generation, dynamic UI adaptation, and visual diff analysis, with comparable performance among several vendors. The team also built a backend prompt‑management system to support large‑scale AI‑driven testing and plans to further integrate private model deployments and vector‑based knowledge bases.

In conclusion, integrating multimodal LLMs into frontend automation reduces the reliance on brittle code, improves test coverage, and opens a path toward fully AI‑assisted testing pipelines, while highlighting the need for multi‑model collaboration, localized engineering, and continuous prompt refinement.

prompt engineeringmultimodal LLMFrontend TestingAI AutomationNL2Test
Snowball Engineer Team
Written by

Snowball Engineer Team

Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.