How AI Detects Screenshot Bugs: From CNN Models to Image Clustering
Leveraging TensorFlow's CNN and OCR‑LSTM models, this article details how AI can automatically spot blank pages, UI anomalies, and garbled text in app screenshots, and describes a Jenkins‑driven retraining pipeline and hierarchical clustering to de‑duplicate images and boost manual review efficiency.
1. Model Selection
Bug types that can be discovered without business knowledge include whole‑page blanks, abnormal widget displays, and text anomalies. For blank‑page images, a simple CNN built with TensorFlow is used to distinguish normal from abnormal screenshots. For text anomalies containing garbled characters, an OCR + LSTM model recognizes Chinese characters to detect corruption. Training samples come from historical bug screenshots and mock positive data.
2. Model Retraining – Improving Accuracy
The initial model was trained on limited data, but as the app evolves, new pages cause misclassifications. To address false positives, a front‑end checkbox triggers image deduplication and retraining. Jenkins scheduled jobs collect all retraining images, run the retraining script, and replace the old model with the new one, resulting in a significant accuracy boost after several automatic iterations.
3. Image Processing – Enhancing Manual Review Efficiency
3.1 Special Screenshots
Some screenshots contain large blank areas but are correct from a business perspective (e.g., a search middle page). Without handling, they are repeatedly flagged as abnormal, wasting review time. To solve this, a gallery of known‑good images is maintained; if a flagged image matches any gallery image above a similarity threshold, it is ignored.
3.2 Image Deduplication
During a traversal task, each page is visited at least twice, and clicked elements are highlighted with red boxes, resulting in many duplicate screenshots. Displaying deduplicated results greatly reduces visual fatigue for reviewers.
3.2.1 Solution
When the number of images is large and the total distinct pages unknown, hierarchical clustering can be used. The approach starts with each screenshot as a cluster, repeatedly merges the two closest clusters until a stopping condition is met.
3.2.2 Implementation
1) Compute distances between images: convert each image to a w × h × 3 vector and use Euclidean distance; more similar images have smaller distances.
2) Compute cluster‑to‑cluster distances. Methods considered: single, complete, average, and ward. Experiments showed the ward method performed best, so it is used. Z = linkage(X, 'ward') 3) Choose a critical distance threshold. A too‑small threshold prevents similar images from clustering; a too‑large threshold merges unrelated pages. Experiments indicate that abnormal images (identified by the page‑exception model) are more similar, so separate clustering for abnormal and normal images with a smaller threshold for the abnormal set yields better results.
4. Summary and Outlook
The current tool effectively detects whole‑page anomalies, and the accuracy of text‑anomaly detection improves as more samples are added.
Future work includes integrating the LabelImg tool and building an SSD model with TensorFlow to identify widget‑level image defects, as well as tackling layout‑distortion and operation‑expectation mismatches.
References
[1] Image clustering computation: https://haojunsui.github.io/2016/07/16/scipy-hac/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
