Overview of Prominent Deep Learning Architectures for Computer Vision
This article surveys recent progress in deep learning by presenting key computer‑vision architectures such as AlexNet, VGG, GoogleNet, ResNet, ResNeXt, RCNN, YOLO, SqueezeNet, SegNet and GANs, providing brief descriptions, their advantages, and links to original papers and Keras implementations.
In recent years, deep learning has progressed rapidly, making it increasingly difficult to keep up with its innovations, most of which appear in research papers on arXiv, Springer and other venues.
This article introduces several recent deep‑learning advances and provides Keras code examples together with links to the original papers.
For brevity, only successful computer‑vision architectures are covered.
The article assumes the reader already understands neural networks and is comfortable with Keras; beginners are strongly encouraged to read the following introductory articles first:
Fundamentals of Deep Learning – Starting with Artificial Neural Network
Tutorial: Optimizing Neural Networks using Keras (with Image recognition case study)
What is a deep‑learning “high‑level architecture”?
Compared with simple machine‑learning algorithms, deep‑learning models are far more flexible because neural networks can be assembled like LEGO blocks to build simple or complex structures. A “high‑level architecture” is defined as a deep‑learning model that has a proven record of success on challenges such as ImageNet, where the task is to recognize objects in images.
Different Types of Computer‑Vision Tasks
Computer‑vision tasks aim to build models that replicate human visual perception. The main categories are:
Object classification: assign an input image to a predefined category.
Classification and localization: locate a single object within an image.
Object detection: identify the positions and categories of multiple objects in an image.
Image segmentation: map each pixel to a class, producing a detailed mask of the scene.
Key Deep‑Learning Architectures
1. AlexNet
AlexNet was the first breakthrough deep‑learning architecture, introduced by Geoffrey Hinton and colleagues. It consists of stacked convolutional and pooling layers followed by fully‑connected layers, and was the first to demonstrate the advantage of training on GPUs.
Original Paper: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Code implementation (Keras): https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
2. VGG Net
VGG, proposed by the Visual Geometry Group at Oxford, is characterized by a deep, pyramid‑shaped architecture with many small (3×3) convolutional filters followed by pooling layers.
VGG is widely used for benchmarking and benefits from abundant pretrained models, though training from scratch is computationally expensive.
Original Paper: https://arxiv.org/abs/1409.1556
Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py
3. GoogleNet (Inception)
GoogleNet, also known as Inception, won the 2014 ImageNet competition. It introduced the Inception module, which combines multiple convolutional filter sizes in parallel, allowing a deeper network with fewer parameters.
Original Paper: https://arxiv.org/abs/1512.00567
Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py
4. ResNet
ResNet introduced residual blocks that allow very deep networks (e.g., 152 layers) to be trained effectively by providing shortcut connections that bypass one or more layers.
Original Paper: https://arxiv.org/abs/1512.03385
Code implementation (Keras): https://github.com/fchollet/keras/blob/master/keras/applications/resnet50.py
5. ResNeXt
ResNeXt builds on the ideas of Inception and ResNet, using a split‑transform‑merge strategy to create a highly modular and scalable architecture.
Original Paper: https://arxiv.org/pdf/1611.05431.pdf
Code implementation: https://github.com/titu1994/Keras-ResNeXt
6. RCNN (Region‑Based CNN)
RCNN tackles object detection by first generating region proposals and then classifying each region with a CNN.
Original Paper: https://arxiv.org/abs/1506.01497
Code implementation: https://github.com/yhenon/keras-frcnn
7. YOLO (You Only Look Once)
YOLO is a real‑time object‑detection system that divides an image into a grid and predicts bounding boxes and class probabilities for each cell, achieving up to 40 frames per second.
Original Paper: https://pjreddie.com/media/files/papers/yolo.pdf
Code implementation: https://github.com/allanzelener/YAD2K
8. SqueezeNet
SqueezeNet is designed for low‑bandwidth environments; its fire modules reduce model size to about 4.9 MB while retaining comparable accuracy.
Original Paper: https://arxiv.org/abs/1602.07360
Code implementation: https://github.com/rcmalli/keras-squeezenet
9. SegNet
SegNet addresses image segmentation by using an encoder‑decoder architecture that preserves high‑frequency details through pooling indices.
Original Paper: https://arxiv.org/abs/1511.00561
Code implementation: https://github.com/imlab-uiip/keras-segnet
10. GAN (Generative Adversarial Network)
GANs consist of a generator and a discriminator that compete, enabling the creation of realistic synthetic images that were never present in the training set.
Original Paper: https://arxiv.org/abs/1406.2661
Code implementation: https://github.com/bstriner/keras-adversarial
Source: http://dataunion.org/31240.html
© The content is sourced from the web; copyright belongs to the original authors. If any infringement is identified, please notify us for removal.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
