Frontend Development 15 min read

AutoConsis: Automated UI Consistency Detection for Mobile Apps Using Multimodal AI

AutoConsis is a research‑driven, AI‑powered workflow that automatically detects UI content inconsistencies across mobile app pages by combining target region recognition, OCR‑based extraction, and large language model reasoning, achieving low cost, high generalization, and high confidence as demonstrated on Meituan's large‑scale marketing scenarios.

Meituan Technology Team

Nov 21, 2024

AutoConsis: Automated UI Consistency Detection for Mobile Apps Using Multimodal AI

Background

Mobile app business pages have become increasingly complex, with UI elements for the same business scattered across many pages maintained by different teams. This leads to hard‑to‑detect UI inconsistencies, such as differing prices for the same product shown on multiple pages, which traditionally rely on manual testing.

Implementation Principle and Project Practice

Overall Workflow

AutoConsis transforms UI consistency checking into a three‑stage pipeline: target region recognition, target information extraction, and consistency judgment. The system leverages large language models (LLMs) to achieve high generalization across diverse UI layouts and technology stacks.

Target Region Recognition

To reduce noise from irrelevant text, AutoConsis first isolates UI regions relevant to the consistency check. Instead of training a custom object detector, the authors adopt a multimodal CLIP model (OpenAI) and enhance it with weighted image‑and‑text queries to locate product cards on marketing pages.

Multimodal UI region recognition results

In an experiment on 100 product‑list pages, the multimodal CLIP approach outperformed single‑modality baselines, confirming its effectiveness for UI region detection.

Target Information Extraction

After region isolation, OCR extracts all visible characters. The extracted text is combined with a Chain‑of‑Thought (CoT) prompt and fed to an LLM (GPT‑3.5‑Turbo) to obtain key fields such as original price, discount, and final price. Two ablation prompts—Standard In‑Context Learning (ICL) and Zero‑Shot—were evaluated to measure the impact of reasoning steps and examples.

Consistency Judgment

The extracted fields are checked against predefined rules. Numerical rules directly compare values (e.g., price equality across pages), while semantic rules use CoT prompts and LLM reasoning to verify relationships such as product‑category alignment.

MLLM‑Based Exploration

The authors also experimented with GPT‑4V as a single‑step multimodal LLM that processes the whole UI screenshot. Compared with the full AutoConsis pipeline, GPT‑4V showed higher computational cost and lower reliability, leading the team to retain the modular approach for large‑scale batch inspections.

Application Effect

Deployed in Meituan’s “flash‑sale” marketing scenario, AutoConsis now covers 700 cities and over 4,000 pages, uncovering dozens of real business issues. The workflow has been reused for ticket‑operation inspections (200 pages in 12 cities, 8 defects found) and category‑consistency checks in the “selected items” business.

Insights and Future Work

The authors argue that decomposing a problem into focused steps—region detection, extraction, and judgment—yields better results than monolithic models. They plan to integrate agent‑based self‑learning to incorporate human feedback, extending the approach to other front‑end testing scenarios.

large language model CLIP UI testing mobile apps consistency detection ICSE 2024 Vision-UI

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.