Artificial Intelligence 19 min read

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

This article reviews the evolution of deep‑learning image‑classification networks, surveys attention mechanisms for fine‑grained tasks, describes the CVPR 2022 FGVC9 competition solution using RegNetY and random attention cropping, and discusses its deployment in automotive series recognition along with future challenges.

HomeTech

Sep 20, 2022

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

The article introduces classic deep‑learning image‑classification architectures—starting from AlexNet, progressing through VGG, ResNet, Inception, DenseNet, EfficientNet, and Vision Transformer (ViT)—highlighting their impact on ImageNet performance and the shift toward automated architecture design such as Neural Architecture Search (NAS) and RegNet.

It then surveys attention mechanisms widely adopted for fine‑grained classification, including SE (Squeeze‑Excitation), GE (Gather‑Excite), CBAM (Convolutional Block Attention Module), and SK (Selective Kernel), explaining how each module refines channel and spatial feature representations.

The paper details the CVPR 2022 Sorghum‑100 FGVC9 competition, where the authors’ team achieved second place using a RegNetY‑16.0GF backbone, high‑resolution inputs, and a suite of training tricks: AutoAugment, pseudo‑labeling, test‑time augmentation, dropout, and model ensembling.

A novel random attention‑region cropping strategy is presented: after each epoch, the current model predicts attention maps, which are binarized and used to randomly crop the most salient region for the next epoch, thereby preserving important details while avoiding information loss from static cropping.

Following the competition, the same RegNetY model was deployed in the company’s automotive series recognition system, improving accuracy by 3.25 % and demonstrating that attention focuses on vehicle front areas, which is critical for distinguishing closely related car models.

The article concludes with an outlook on fine‑grained classification challenges—data annotation difficulty, robustness to image quality, out‑of‑distribution detection, and long‑tail class imbalance—and suggests self‑supervised learning (e.g., MAE) as a promising direction.

References to seminal works (e.g., AlexNet, ResNet, Inception, DenseNet, NASNet, RegNet, ViT, SE, CBAM, SK, AutoAugment) are provided for further reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

image classification computer vision Deep Learning attention mechanisms Fine-Grained Classification CVPR automotive recognition

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.