How We Built an AI‑Powered Smart Inspection System for Mobile Apps
This article details the design and implementation of an AI‑driven smart inspection platform for a mobile app, covering background challenges, system architecture, core detection features—including layout, visual, consistency, and AI‑operation checks—platform configuration, result feedback, and the measurable improvements achieved.
Background
As the DeWu app added more business functions and content, user session time grew and experience issues became increasingly critical. Traditional UI regression testing could not fully cover the diverse scenarios, especially subjective interaction and visual problems, leading to low testing efficiency.
Architecture Overview
The smart inspection workflow integrates several internal services:
Inspection Platform : Central management console where users define detection tasks, rules, and target scenes; aggregates results and issues alerts.
Automation Service : Executes tasks on real devices, handling device scheduling, environment setup, page navigation, AI analysis, custom actions, and exception analysis, then reports results.
Frontend/Client SDK : Captures system‑level errors (JS errors, white‑screen, network failures) and binds them to inspection steps for easier root‑cause identification.
Model Service : Applies AI models to analyze screenshots against user‑defined and generic visual rules, detecting UI, interaction, and rule‑violation issues.
Real‑Device Service : Provides cloud‑based devices for multi‑device inspection and enables remote login for issue reproduction and verification.
Main Feature Design
1. Page Layout Issue Detection
Common UI problems such as misaligned elements, overlapping components, or layout chaos are detected by feeding full‑page screenshots to an AI model that evaluates them against specific layout rules. Two detection modes are offered:
Partial frame matching – checks whether page elements (images, text, price, etc.) align with expectations.
Full page matching – validates complete visual consistency, including text content.
2. General Visual Experience Detection
This feature handles a broader set of visual problems. Users can define custom rules or use default ones. The detection pipeline distinguishes between text‑focused checks (using OCR + AI) and image‑focused checks (AI‑driven element analysis).
3. Page Display Consistency Detection
For multi‑page flows, the system maintains relationships between pages (e.g., product detail ↔︎ listing) and compares screenshots across levels. It addresses three challenges: determining the correct comparison layer, locating target elements via image matching, and performing cross‑page analysis to produce consistency reports.
4. AI Operation & Unresponsive Detection
When a target element is not directly reachable, users describe the prerequisite steps in natural language. An AI model parses the description, generates UI actions, and executes them before visual checks. Additionally, an operation‑validity module compares before/after screenshots using histogram correlation, chi‑square distance, and intersection metrics to flag ineffective clicks.
import json
from openai import OpenAI
from ..config.llm_config import LLMConfig
from ..utils import get_logger
class ChatClient:
# Model initialization
def __init__(self, config_path=None, model_log_path=None):
self.logger = get_logger(log_file=model_log_path)
self.config = LLMConfig(config_path)
self.openai = OpenAI(
api_key=self.config.openai_api_key,
base_url=self.config.openai_api_base,
)
def chat(self, prompt_data):
# Submit screenshot and task description to model
chat_response = self.openai.chat.completions.create(
model=self.config.model,
messages=prompt_data,
max_tokens=self.config.max_tokens,
temperature=self.config.temperature,
extra_body={"vl_high_resolution_images": True},
)
result = chat_response.choices[0].message.content
json_str_result = result.replace("```json", "").replace("```", "")
try:
return json.loads(json_str_result)
except Exception as err:
self.logger.info(f"LLM response err: {err}")
# Attempt JSON repair
try:
import json_repair
return json_repair.repair_json(json_str_result, return_objects=True)
except Exception as err:
self.logger.info(f"LLM response json_repair err: {err}")
# Fallback bbox handling
try:
import re
if "bbox" in json_str_result:
while re.search(r"\d+\s+\d+", json_str_result):
json_str_result = re.sub(r"(\d+)\s+(\d+)", r"\1,\2", json_str_result)
return json.loads(json_str_result)
except Exception as err:
self.logger.info(f"LLM response re.search err: {err}") def ai_tap(self, description):
screenshot_base64 = self.get_resized_screenshot_as_base64()
ret = {"screenshot": screenshot_base64}
prompt = Tap(description).get_prompt(screenshot_base64)
res_obj = self.chat_client.chat(prompt)
if "errors" in res_obj and res_obj["errors"]:
ret["result"] = False
ret["message"] = res_obj["errors"]
else:
x, y = self.get_center_point(res_obj["bbox"])
self._click(x, y)
ret["location"] = {"x": x, "y": y}
ret["result"] = True
ret["message"] = ""
return ret def check_operation_valid(screen_path_before, screen_path_after, cur_ops, screen_oss_path_before, screen_oss_path_after):
try:
import cv2, numpy as np
img_before = cv2.imread(screen_path_before)
img_after = cv2.imread(screen_path_after)
if img_before is None or img_after is None:
raise RuntimeError("Operation validation image read error")
if img_before.shape != img_after.shape:
img_after = cv2.resize(img_after, (img_before.shape[1], img_before.shape[0]))
gray_before = cv2.cvtColor(img_before, cv2.COLOR_BGR2GRAY)
gray_after = cv2.cvtColor(img_after, cv2.COLOR_BGR2GRAY)
hist_before = cv2.calcHist([gray_before], [0], None, [256], [0, 256])
hist_after = cv2.calcHist([gray_after], [0], None, [256], [0, 256])
hist_before = cv2.normalize(hist_before, hist_before).flatten()
hist_after = cv2.normalize(hist_after, hist_after).flatten()
correlation = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_CORREL)
chi_square = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_CHISQR)
intersection = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_INTERSECT)
if correlation > Thres and chi_square < Thres_chi and intersection > Thres:
raise RuntimeError(f"Current operation: {cur_ops} appears ineffective")
except Exception as e:
raise ePlatform Construction and Usage
Users configure detection rules on the inspection platform, which are shared across tasks. Basic rule items describe the detection scope without excessive detail. Example: a generic rule checks common layout and error patterns. After execution, results—including comparison screenshots, model analysis, and conclusions—are displayed for review and feedback.
Result Feedback
Standard tasks provide detailed logs: rule comparison images, real‑time screenshots, model reasoning, and final verdicts. Testers can pinpoint issues, report false positives, and help improve model accuracy.
Conclusion
By integrating AI visual‑language models into the mobile testing pipeline, the smart inspection system raised detection accuracy from ~50% to over 80% and image similarity matching beyond 80%. In a pilot AI walkthrough, 17 configuration problems were found with a 95% AI detection rate. Future work will continue to expand AI‑driven scenarios to further boost testing efficiency and user experience.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
