AI‑Powered UI Element Recognition for Mobile App Automation Testing
iQIYI’s AI‑driven UI element recognizer uses a YOLOv3 model trained on thousands of mobile and PC screenshots to locate obfuscated controls across diverse devices, integrating predictions into its IDE for reliable automation of React Native, Flutter, H5 and mini‑program interfaces.
In UI automation testing, elements must be identified before user actions can be simulated. Mobile apps expose attributes such as ID, class name, and text, but cross‑platform frameworks (React Native, Flutter, mini‑programs) often require Chrome DevTools protocol or image‑similarity matching for locating elements.
The conventional approach relies on the automation framework to recognize elements, which in turn depends on developers giving meaningful names to UI controls. Once controls are obfuscated, locating them becomes extremely difficult.
This article shares a general solution derived from iQIYI’s automation practice, aiming to improve element identification by leveraging the device’s own visual recognition capabilities.
Challenges
1. Insufficient element‑recognition ability of existing automation frameworks (e.g., python‑uiautomator2) for RN, H5, and mini‑program pages.
2. Compatibility across multiple device models and resolutions; the same UI may appear differently on various screens.
3. Diverse element styles, semi‑transparent or attribute‑less elements, and skin‑dependent icons that hinder pure image‑matching methods.
Technical Exploration
Initial attempts used image‑hash, template matching, and SIFT feature detection, which worked for single‑device, single‑background scenarios but failed for complex backgrounds and multi‑device requirements.
AI‑based object detection was investigated. After evaluating several models, YOLOv3 was chosen because it combines SSD’s speed with FPN’s multi‑scale detection, enabling both large‑object and small‑object recognition, albeit with some positional bias for dense targets.
Training data comprised over 200 UI elements (≈60 mobile, ≈140 PC). The model achieved ~94% recall for close buttons and >80% for other controls. Deployment on GPU (Intel/NVIDIA) required converting YOLOv3 weights to a Keras model to avoid excessive latency.
Material Collection & Training Pipeline
Three sources feed training images: automatic screenshots during regression runs, stability‑test captures of new app versions, and user‑uploaded assets via the AI testing platform. Images are deduplicated by MD5, resized, and stored in a central repository before model training.
Prediction Interface
The AI service accepts an image (or URL) and returns element class codes and bounding‑box coordinates (left, top, width, height). Example usage in the recording‑playback IDE:
# Recording script example
# Check if search button exists
if iqy.AI(btn="btnsearch").exists():
# Click the button
iqy.AI(btn="btnsearch").click()
# Alternative shortcut
iqy.AI(btn="btnsearch").click_exists()
# Advanced usage
els = iqy.AI(btn="btnsearch")
data = els[0]
print(data["ratio"], "AI prediction confidence")
print(data["w"], "width")
print(data["h"], "height")
print(data["x"], "x coordinate")
print(data["x0"], "alternative x coordinate")This interface is integrated into the iQIYI IDE: detected elements are highlighted on the device screen and listed on the right panel. Clicking an element generates a script that operates on its center point.
Typical scenarios include handling unexpected pop‑ups (e.g., close buttons) and controlling media playback buttons that change frequently.
Deployment & Platform Integration
The AI testing platform provides end‑to‑end capabilities: material upload, annotation, review, model training, and API exposure. Users can submit an image to obtain predictions, with sample Python API calls shown in the UI.
The platform is also integrated with a mobile cloud‑testing service, allowing one‑click AI predictions during cloud device sessions.
Challenges in Production
GPU vs. CPU cost and performance gaps required the use of GPU‑accelerated inference. Model compatibility issues (YOLOv3 trained on Intel GPU not running on NVIDIA) were resolved by converting weights to a Keras format.
Future Plans
- Increase recognition of icons that blend with complex backgrounds. - Enhance page‑splitting for attribute‑less pages (mini‑programs, RN, H5). - Support more complex card layouts, banners, menus, and tabs. - Expand training data diversity through synthetic image generation. - Enable advanced element operations such as distance calculations and feature‑based verification.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
