Artificial Intelligence 12 min read

Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM

The project presents a lightweight MobileNet‑FPN architecture enhanced with a modified spatial‑attention module that evaluates corner‑based self‑similarity to classify sneaker photo backgrounds, achieving 96% test accuracy—exceeding baseline CNN performance—and meeting business targets of over 80% hint accuracy and 90% mandatory enforcement.

DeWu Technology
DeWu Technology
DeWu Technology
Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM

This article describes a practical computer‑vision project aimed at detecting whether user‑uploaded sneaker photos have a clean, uniform background, a task referred to as "background complexity detection".

Business requirement: The system must automatically filter images with clean backgrounds to improve downstream valuation and sales algorithms, achieving >80% accuracy for user hints and >90% for mandatory enforcement.

Model design: A lightweight backbone (MobileNet) is combined with a Feature Pyramid Network (FPN) and a modified Spatial‑Attention‑Module (SAM). The architecture balances accuracy and on‑device resource constraints.

Design rationale: The problem is a spatial‑type recognition task. By analyzing the business scenario, the authors concluded that background complexity can be judged via self‑similarity of image corners. Four corners are examined; similarity among the top two corners and their match counts against the whole image form three scoring metrics.

Failed traditional ideas: Simple Gaussian filtering, edge/gradient analysis, Fourier frequency analysis, and template averaging were tested but proved unreliable because high‑frequency textures (e.g., carpet) do not necessarily indicate a complex background.

Final approach: The corner‑based self‑similarity scores are weighted to produce a final complexity score. The model is trained on resized images, achieving 96% accuracy on the test set, surpassing a baseline CNN (93%).

Advanced techniques: The authors discuss hidden object detection, implicit segmentation, and combined spatial‑plus‑channel attention (modified SAM, CBAM). They note that a full‑resolution mask is unnecessary; a mid‑level mask suffices, reducing parameters.

Result comparison: Adding attention modules and FPN improves both accuracy and interpretability, providing clearer optimization directions for future iterations.

CNNcomputer visionimage processingAttentionbackground detectionMobileNet
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.