Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition
This article reviews the evolution of deep‑learning image‑classification networks, surveys attention mechanisms for fine‑grained tasks, describes the CVPR 2022 FGVC9 competition solution using RegNetY and random attention cropping, and discusses its deployment in automotive series recognition along with future challenges.
The article introduces classic deep‑learning image‑classification architectures—starting from AlexNet, progressing through VGG, ResNet, Inception, DenseNet, EfficientNet, and Vision Transformer (ViT)—highlighting their impact on ImageNet performance and the shift toward automated architecture design such as Neural Architecture Search (NAS) and RegNet.
It then surveys attention mechanisms widely adopted for fine‑grained classification, including SE (Squeeze‑Excitation), GE (Gather‑Excite), CBAM (Convolutional Block Attention Module), and SK (Selective Kernel), explaining how each module refines channel and spatial feature representations.
The paper details the CVPR 2022 Sorghum‑100 FGVC9 competition, where the authors’ team achieved second place using a RegNetY‑16.0GF backbone, high‑resolution inputs, and a suite of training tricks: AutoAugment, pseudo‑labeling, test‑time augmentation, dropout, and model ensembling.
A novel random attention‑region cropping strategy is presented: after each epoch, the current model predicts attention maps, which are binarized and used to randomly crop the most salient region for the next epoch, thereby preserving important details while avoiding information loss from static cropping.
Following the competition, the same RegNetY model was deployed in the company’s automotive series recognition system, improving accuracy by 3.25 % and demonstrating that attention focuses on vehicle front areas, which is critical for distinguishing closely related car models.
The article concludes with an outlook on fine‑grained classification challenges—data annotation difficulty, robustness to image quality, out‑of‑distribution detection, and long‑tail class imbalance—and suggests self‑supervised learning (e.g., MAE) as a promising direction.
References to seminal works (e.g., AlexNet, ResNet, Inception, DenseNet, NASNet, RegNet, ViT, SE, CBAM, SK, AutoAugment) are provided for further reading.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.