AutoConsis: Automated UI Consistency Detection for Mobile Apps Using Multimodal AI
AutoConsis is a research‑driven, AI‑powered workflow that automatically detects UI content inconsistencies across mobile app pages by combining target region recognition, OCR‑based extraction, and large language model reasoning, achieving low cost, high generalization, and high confidence as demonstrated on Meituan's large‑scale marketing scenarios.
Background
Mobile app business pages have become increasingly complex, with UI elements for the same business scattered across many pages maintained by different teams. This leads to hard‑to‑detect UI inconsistencies, such as differing prices for the same product shown on multiple pages, which traditionally rely on manual testing.
Implementation Principle and Project Practice
Overall Workflow
AutoConsis transforms UI consistency checking into a three‑stage pipeline: target region recognition, target information extraction, and consistency judgment. The system leverages large language models (LLMs) to achieve high generalization across diverse UI layouts and technology stacks.
Target Region Recognition
To reduce noise from irrelevant text, AutoConsis first isolates UI regions relevant to the consistency check. Instead of training a custom object detector, the authors adopt a multimodal CLIP model (OpenAI) and enhance it with weighted image‑and‑text queries to locate product cards on marketing pages.
In an experiment on 100 product‑list pages, the multimodal CLIP approach outperformed single‑modality baselines, confirming its effectiveness for UI region detection.
Target Information Extraction
After region isolation, OCR extracts all visible characters. The extracted text is combined with a Chain‑of‑Thought (CoT) prompt and fed to an LLM (GPT‑3.5‑Turbo) to obtain key fields such as original price, discount, and final price. Two ablation prompts—Standard In‑Context Learning (ICL) and Zero‑Shot—were evaluated to measure the impact of reasoning steps and examples.
Consistency Judgment
The extracted fields are checked against predefined rules. Numerical rules directly compare values (e.g., price equality across pages), while semantic rules use CoT prompts and LLM reasoning to verify relationships such as product‑category alignment.
MLLM‑Based Exploration
The authors also experimented with GPT‑4V as a single‑step multimodal LLM that processes the whole UI screenshot. Compared with the full AutoConsis pipeline, GPT‑4V showed higher computational cost and lower reliability, leading the team to retain the modular approach for large‑scale batch inspections.
Application Effect
Deployed in Meituan’s “flash‑sale” marketing scenario, AutoConsis now covers 700 cities and over 4,000 pages, uncovering dozens of real business issues. The workflow has been reused for ticket‑operation inspections (200 pages in 12 cities, 8 defects found) and category‑consistency checks in the “selected items” business.
Insights and Future Work
The authors argue that decomposing a problem into focused steps—region detection, extraction, and judgment—yields better results than monolithic models. They plan to integrate agent‑based self‑learning to incorporate human feedback, extending the approach to other front‑end testing scenarios.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
