Research on Content-Based Image Retrieval Techniques
This article reviews the fundamentals, feature extraction methods, evaluation metrics, and common datasets of content‑based image retrieval (CBIR), discussing traditional low‑level features, local descriptors, unsupervised and supervised learning approaches, and recent deep‑learning models for improving retrieval performance.
Image retrieval technology aims to find similar images by comparing visual content, with applications in security monitoring, e‑commerce, medical assistance, education, social media, and search engines. It has become increasingly important as AI and deep learning advance.
Definition : Content‑Based Image Retrieval (CBIR) retrieves images similar to a query image from a database, contrasting with Text‑Based Image Retrieval (TBIR) that relies on keywords.
CBIR typically follows two main steps: feature extraction (including traditional low‑level features and machine‑learning‑based high‑level semantic features) and similarity search.
Feature Extraction
Traditional low‑level features are divided into global and local features. Global features describe the whole image (color, texture, fingerprint, shape, spatial information) and are fast to compute but sensitive to illumination and rotation. Local features (keypoints, corners, binary descriptors) capture region‑level details, offering robustness to scale and rotation.
Examples of global features include color histograms in RGB, HSV, LAB spaces; texture descriptors such as GLCM, Gabor filters, DWT; fingerprint descriptors using Local Binary Patterns (LBP); shape descriptors like Fourier descriptors; and spatial information via spatial pyramids.
Local feature methods include SIFT, SURF, Harris, FAST, and binary descriptors (BRIEF, BRISK, ORB, FREAK).
Evaluation Metrics
Common metrics are Recall (R), Precision (P), F‑score, and mean Average Precision (mAP). Formulas for these metrics are illustrated in the original figures.
Typical Datasets
Popular image retrieval datasets include CIFAR‑10, NUS‑WIDE, MS‑COCO, Flickr30k, Caltech256, Google Landmarks v2, XMarke, CUB200‑2011, Aircraft, Paris‑6k, Oxford5k, UKBench, Holidays, Sketchy, and Fashion‑IQ, each with varying numbers of classes and images for different retrieval scenarios.
Methods
Unsupervised approaches such as K‑means clustering and PCA dimensionality reduction are widely used, though they have limitations like sensitivity to initialization and inability to handle outliers.
Supervised methods leverage labeled data, employing algorithms like Support Vector Machines (SVM) and Artificial Neural Networks (ANN).
Deep learning has become dominant, with architectures such as AlexNet, VGG, GoogLeNet, ResNet, MobileNet, and EfficientNet providing powerful feature extractors. Features can be taken from fully connected layers (global) or convolutional layers (local), and combined via pooling methods (average, max, R‑MAC, SPoC, CroW, SCDA, GeM) or feature‑level fusion across layers and models.
Summary
Effective CBIR requires appropriate feature selection, dimensionality reduction, and similarity measurement tailored to specific datasets and application domains. While current methods focus on static datasets, future research should address incremental learning to adapt to new data without retraining from scratch.
Reference: Yang Hui, Shi Shuicai. "Research on Content‑Based Image Retrieval Technology". Software Guide, 2023, 22(04):229‑244.
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.