Artificial Intelligence 5 min read

Image Feature Extraction and Clustering for Key Frame Selection in Mobile App Installation Screenshots

This article presents a technical solution for extracting representative key frames from time‑series screenshots of a mobile app installation process, covering pixel sampling, dimensionality reduction, classic feature extractors (SIFT, HOG, ORB), auto‑encoder based deep learning, and clustering methods such as KMeans and DBSCAN, along with practical results and performance analysis.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Image Feature Extraction and Clustering for Key Frame Selection in Mobile App Installation Screenshots

Background : In mobile app testing, it is necessary to select representative key frames from a sequence of installation screenshots. Challenges include multiple frames per action, post‑installation screens, and the need for concise visual summaries.

Technical Solution Overview : The approach combines staged screenshot capture with clustering algorithms. It employs pixel‑point sampling and dimensionality reduction (PCA, LDA) to lower data volume, followed by feature extraction using traditional algorithms (SIFT, HOG, ORB) and deep‑learning auto‑encoders.

Pixel Sampling & Dimensionality Reduction : Images are down‑sampled by selecting pixels at regular intervals to reduce size. PCA (unsupervised) and LDA (supervised) extract principal components, providing compact image descriptors while noting that low‑variance vectors may still be important for classification.

SIFT : Historically the most common pre‑deep‑learning feature extractor. It builds a Gaussian scale space, detects extrema, discards low‑quality points, assigns orientations, and generates descriptor vectors.

HOG : Computes histograms of oriented gradients, emphasizing edge information and improving illumination robustness. Compared with SIFT, HOG retains edge features but may be slower; both have comparable computation times.

ORB : Combines FAST key‑point detection with BRIEF descriptors, offering at least an order of magnitude speed improvement over SIFT at the cost of reduced descriptor accuracy.

Auto‑Encoder Feature Extraction : A neural network encoder‑decoder is trained to reconstruct input images, yielding an encoder that serves as a compact, end‑to‑end feature extractor. Training on 89 images (GPU, 2 s/epoch, 10 epochs) achieves a loss of ~0.0047, demonstrating effective representation learning.

Clustering Algorithms : After feature extraction, KMeans (distance‑based) and DBSCAN (density‑based) are applied. KMeans requires a predefined number of clusters and is sensitive to initialization, while DBSCAN automatically determines cluster count and handles non‑linear data.

Application Results : The combined pipeline reduces processing time (89 images processed quickly), but challenges remain such as indistinguishable pre‑ and post‑installation screenshots and occasional mis‑classification of dynamic windows. Visual results show the effectiveness of SIFT/HOG/ORB and auto‑encoder + DBSCAN in distinguishing key frames.

computer visionclusteringImage ProcessingFeature ExtractionORBSIFTautoencoderHOG
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.