Mobile Development 19 min read

How We Built an AI‑Powered Smart Inspection System for Mobile Apps

This article details the design and implementation of an AI‑driven smart inspection platform for a mobile app, covering background challenges, system architecture, core detection features—including layout, visual, consistency, and AI‑operation checks—platform configuration, result feedback, and the measurable improvements achieved.

DeWu Technology
DeWu Technology
DeWu Technology
How We Built an AI‑Powered Smart Inspection System for Mobile Apps

Background

As the DeWu app added more business functions and content, user session time grew and experience issues became increasingly critical. Traditional UI regression testing could not fully cover the diverse scenarios, especially subjective interaction and visual problems, leading to low testing efficiency.

Architecture Overview

The smart inspection workflow integrates several internal services:

Inspection Platform : Central management console where users define detection tasks, rules, and target scenes; aggregates results and issues alerts.

Automation Service : Executes tasks on real devices, handling device scheduling, environment setup, page navigation, AI analysis, custom actions, and exception analysis, then reports results.

Frontend/Client SDK : Captures system‑level errors (JS errors, white‑screen, network failures) and binds them to inspection steps for easier root‑cause identification.

Model Service : Applies AI models to analyze screenshots against user‑defined and generic visual rules, detecting UI, interaction, and rule‑violation issues.

Real‑Device Service : Provides cloud‑based devices for multi‑device inspection and enables remote login for issue reproduction and verification.

Main Feature Design

1. Page Layout Issue Detection

Common UI problems such as misaligned elements, overlapping components, or layout chaos are detected by feeding full‑page screenshots to an AI model that evaluates them against specific layout rules. Two detection modes are offered:

Partial frame matching – checks whether page elements (images, text, price, etc.) align with expectations.

Full page matching – validates complete visual consistency, including text content.

2. General Visual Experience Detection

This feature handles a broader set of visual problems. Users can define custom rules or use default ones. The detection pipeline distinguishes between text‑focused checks (using OCR + AI) and image‑focused checks (AI‑driven element analysis).

3. Page Display Consistency Detection

For multi‑page flows, the system maintains relationships between pages (e.g., product detail ↔︎ listing) and compares screenshots across levels. It addresses three challenges: determining the correct comparison layer, locating target elements via image matching, and performing cross‑page analysis to produce consistency reports.

4. AI Operation & Unresponsive Detection

When a target element is not directly reachable, users describe the prerequisite steps in natural language. An AI model parses the description, generates UI actions, and executes them before visual checks. Additionally, an operation‑validity module compares before/after screenshots using histogram correlation, chi‑square distance, and intersection metrics to flag ineffective clicks.

import json
from openai import OpenAI
from ..config.llm_config import LLMConfig
from ..utils import get_logger

class ChatClient:
    # Model initialization
    def __init__(self, config_path=None, model_log_path=None):
        self.logger = get_logger(log_file=model_log_path)
        self.config = LLMConfig(config_path)
        self.openai = OpenAI(
            api_key=self.config.openai_api_key,
            base_url=self.config.openai_api_base,
        )

    def chat(self, prompt_data):
        # Submit screenshot and task description to model
        chat_response = self.openai.chat.completions.create(
            model=self.config.model,
            messages=prompt_data,
            max_tokens=self.config.max_tokens,
            temperature=self.config.temperature,
            extra_body={"vl_high_resolution_images": True},
        )
        result = chat_response.choices[0].message.content
        json_str_result = result.replace("```json", "").replace("```", "")
        try:
            return json.loads(json_str_result)
        except Exception as err:
            self.logger.info(f"LLM response err: {err}")
        # Attempt JSON repair
        try:
            import json_repair
            return json_repair.repair_json(json_str_result, return_objects=True)
        except Exception as err:
            self.logger.info(f"LLM response json_repair err: {err}")
        # Fallback bbox handling
        try:
            import re
            if "bbox" in json_str_result:
                while re.search(r"\d+\s+\d+", json_str_result):
                    json_str_result = re.sub(r"(\d+)\s+(\d+)", r"\1,\2", json_str_result)
            return json.loads(json_str_result)
        except Exception as err:
            self.logger.info(f"LLM response re.search err: {err}")
def ai_tap(self, description):
    screenshot_base64 = self.get_resized_screenshot_as_base64()
    ret = {"screenshot": screenshot_base64}
    prompt = Tap(description).get_prompt(screenshot_base64)
    res_obj = self.chat_client.chat(prompt)
    if "errors" in res_obj and res_obj["errors"]:
        ret["result"] = False
        ret["message"] = res_obj["errors"]
    else:
        x, y = self.get_center_point(res_obj["bbox"])
        self._click(x, y)
        ret["location"] = {"x": x, "y": y}
        ret["result"] = True
        ret["message"] = ""
    return ret
def check_operation_valid(screen_path_before, screen_path_after, cur_ops, screen_oss_path_before, screen_oss_path_after):
    try:
        import cv2, numpy as np
        img_before = cv2.imread(screen_path_before)
        img_after = cv2.imread(screen_path_after)
        if img_before is None or img_after is None:
            raise RuntimeError("Operation validation image read error")
        if img_before.shape != img_after.shape:
            img_after = cv2.resize(img_after, (img_before.shape[1], img_before.shape[0]))
        gray_before = cv2.cvtColor(img_before, cv2.COLOR_BGR2GRAY)
        gray_after = cv2.cvtColor(img_after, cv2.COLOR_BGR2GRAY)
        hist_before = cv2.calcHist([gray_before], [0], None, [256], [0, 256])
        hist_after = cv2.calcHist([gray_after], [0], None, [256], [0, 256])
        hist_before = cv2.normalize(hist_before, hist_before).flatten()
        hist_after = cv2.normalize(hist_after, hist_after).flatten()
        correlation = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_CORREL)
        chi_square = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_CHISQR)
        intersection = cv2.compareHist(hist_before, hist_after, cv2.HISTCMP_INTERSECT)
        if correlation > Thres and chi_square < Thres_chi and intersection > Thres:
            raise RuntimeError(f"Current operation: {cur_ops} appears ineffective")
    except Exception as e:
        raise e

Platform Construction and Usage

Users configure detection rules on the inspection platform, which are shared across tasks. Basic rule items describe the detection scope without excessive detail. Example: a generic rule checks common layout and error patterns. After execution, results—including comparison screenshots, model analysis, and conclusions—are displayed for review and feedback.

Result Feedback

Standard tasks provide detailed logs: rule comparison images, real‑time screenshots, model reasoning, and final verdicts. Testers can pinpoint issues, report false positives, and help improve model accuracy.

Conclusion

By integrating AI visual‑language models into the mobile testing pipeline, the smart inspection system raised detection accuracy from ~50% to over 80% and image similarity matching beyond 80%. In a pilot AI walkthrough, 17 configuration problems were found with a 95% AI detection rate. Future work will continue to expand AI‑driven scenarios to further boost testing efficiency and user experience.

UI Automationmobile testingLarge Language Modelapp qualitysmart inspectionAI inspectionvisual detection
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.