Overview of Prominent Deep Learning Architectures for Computer Vision

This article surveys recent progress in deep learning by presenting key computer‑vision architectures such as AlexNet, VGG, GoogleNet, ResNet, ResNeXt, RCNN, YOLO, SqueezeNet, SegNet and GANs, providing brief descriptions, their advantages, and links to original papers and Keras implementations.

Architecture Digest
Architecture Digest
Architecture Digest
Overview of Prominent Deep Learning Architectures for Computer Vision

In recent years, deep learning has progressed rapidly, making it increasingly difficult to keep up with its innovations, most of which appear in research papers on arXiv, Springer and other venues.

This article introduces several recent deep‑learning advances and provides Keras code examples together with links to the original papers.

For brevity, only successful computer‑vision architectures are covered.

The article assumes the reader already understands neural networks and is comfortable with Keras; beginners are strongly encouraged to read the following introductory articles first:

Fundamentals of Deep Learning – Starting with Artificial Neural Network

Tutorial: Optimizing Neural Networks using Keras (with Image recognition case study)

What is a deep‑learning “high‑level architecture”?

Compared with simple machine‑learning algorithms, deep‑learning models are far more flexible because neural networks can be assembled like LEGO blocks to build simple or complex structures. A “high‑level architecture” is defined as a deep‑learning model that has a proven record of success on challenges such as ImageNet, where the task is to recognize objects in images.

Different Types of Computer‑Vision Tasks

Computer‑vision tasks aim to build models that replicate human visual perception. The main categories are:

Object classification: assign an input image to a predefined category.

Classification and localization: locate a single object within an image.

Object detection: identify the positions and categories of multiple objects in an image.

Image segmentation: map each pixel to a class, producing a detailed mask of the scene.

Key Deep‑Learning Architectures

1. AlexNet

AlexNet was the first breakthrough deep‑learning architecture, introduced by Geoffrey Hinton and colleagues. It consists of stacked convolutional and pooling layers followed by fully‑connected layers, and was the first to demonstrate the advantage of training on GPUs.

Original Paper: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Code implementation (Keras): https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

2. VGG Net

VGG, proposed by the Visual Geometry Group at Oxford, is characterized by a deep, pyramid‑shaped architecture with many small (3×3) convolutional filters followed by pooling layers.

VGG is widely used for benchmarking and benefits from abundant pretrained models, though training from scratch is computationally expensive.

Original Paper: https://arxiv.org/abs/1409.1556

Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py

3. GoogleNet (Inception)

GoogleNet, also known as Inception, won the 2014 ImageNet competition. It introduced the Inception module, which combines multiple convolutional filter sizes in parallel, allowing a deeper network with fewer parameters.

Original Paper: https://arxiv.org/abs/1512.00567

Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py

4. ResNet

ResNet introduced residual blocks that allow very deep networks (e.g., 152 layers) to be trained effectively by providing shortcut connections that bypass one or more layers.

Original Paper: https://arxiv.org/abs/1512.03385

Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/resnet50.py

5. ResNeXt

ResNeXt builds on the ideas of Inception and ResNet, using a split‑transform‑merge strategy to create a highly modular and scalable architecture.

Original Paper: https://arxiv.org/pdf/1611.05431.pdf

Code implementation: https://github.com/titu1994/Keras-ResNeXt

6. RCNN (Region‑Based CNN)

RCNN tackles object detection by first generating region proposals and then classifying each region with a CNN.

Original Paper: https://arxiv.org/abs/1506.01497

Code implementation: https://github.com/yhenon/keras-frcnn

7. YOLO (You Only Look Once)

YOLO is a real‑time object‑detection system that divides an image into a grid and predicts bounding boxes and class probabilities for each cell, achieving up to 40 frames per second.

Original Paper: https://pjreddie.com/media/files/papers/yolo.pdf

Code implementation: https://github.com/allanzelener/YAD2K

8. SqueezeNet

SqueezeNet is designed for low‑bandwidth environments; its fire modules reduce model size to about 4.9 MB while retaining comparable accuracy.

Original Paper: https://arxiv.org/abs/1602.07360

Code implementation: https://github.com/rcmalli/keras-squeezenet

9. SegNet

SegNet addresses image segmentation by using an encoder‑decoder architecture that preserves high‑frequency details through pooling indices.

Original Paper: https://arxiv.org/abs/1511.00561

Code implementation: https://github.com/imlab-uiip/keras-segnet

10. GAN (Generative Adversarial Network)

GANs consist of a generator and a discriminator that compete, enabling the creation of realistic synthetic images that were never present in the training set.

Original Paper: https://arxiv.org/abs/1406.2661

Code implementation: https://github.com/bstriner/keras-adversarial

Source: http://dataunion.org/31240.html

© The content is sourced from the web; copyright belongs to the original authors. If any infringement is identified, please notify us for removal.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionDeep LearningNeural Networksimage recognitionKerasmodel architectures
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.