Artificial Intelligence 29 min read

Overview of Deep Neural Network Architectures

This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.

Rare Earth Juejin Tech Community

Jul 31, 2023

Overview of Deep Neural Network Architectures

1. Introduction to Deep Neural Networks

Deep Neural Network (DNN) is a computational model composed of multiple layers of neurons that learn mappings between inputs and outputs through forward propagation, loss calculation, and back‑propagation to adjust weights and biases.

2. Twelve Deep Neural Network Models

2.1 Feedforward Neural Network (FNN)

Also called Multi‑Layer Perceptron, it consists of an input layer, one or more hidden layers, and an output layer. Data flows only forward. Typical training includes weight initialization, forward pass, loss computation, back‑propagation and weight updates.

import tensorflow as tf
# define feedforward neural network
def ffn_model():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(64, activation='relu', input_shape=(input_size,)))
    model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
    return model

input_size = 100
output_size = 10
model = ffn_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=10, batch_size=32)
predictions = model.predict(test_data)

2.2 Convolutional Neural Network (CNN)

Briefly described earlier; links to introductory tutorials.

2.3 Recurrent Neural Network (RNN)

RNN processes sequential data by maintaining a hidden state that carries information from previous time steps. Training uses back‑propagation through time.

import tensorflow as tf
def rnn_model():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.SimpleRNN(64, activation='relu', input_shape=(time_steps, input_size)))
    model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
    return model

input_size = 100
output_size = 10
time_steps = 10
model = rnn_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=10, batch_size=32)
predictions = model.predict(test_data)

2.4 Long Short‑Term Memory (LSTM)

LSTM adds memory cells and three gates (input, forget, output) to overcome vanishing gradients in long sequences.

# define LSTM model
def lstm_model():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.LSTM(64, activation='tanh', input_shape=(time_steps, input_size)))
    model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
    return model

2.5 Deep Belief Network (DBN)

DBN stacks Restricted Boltzmann Machines (RBM) and is trained layer‑wise (unsupervised pre‑training) followed by supervised fine‑tuning.

# DBN model definition
class DBN(nn.Module):
    def __init__(self, input_size, hidden_sizes):
        super(DBN, self).__init__()
        self.rbm_layers = nn.ModuleList()
        for i in range(len(hidden_sizes)):
            visible_size = input_size if i == 0 else hidden_sizes[i-1]
            rbm_layer = RBM(visible_size, hidden_sizes[i])
            self.rbm_layers.append(rbm_layer)

    def forward(self, x):
        for rbm in self.rbm_layers:
            x = rbm(x)
        return x

class RBM(nn.Module):
    def __init__(self, visible_size, hidden_size):
        super(RBM, self).__init__()
        self.W = nn.Parameter(torch.randn(visible_size, hidden_size))
        self.v_bias = nn.Parameter(torch.randn(visible_size))
        self.h_bias = nn.Parameter(torch.randn(hidden_size))

    def forward(self, x):
        h_prob = torch.sigmoid(torch.matmul(x, self.W) + self.h_bias)
        h = torch.bernoulli(h_prob)
        v_prob = torch.sigmoid(torch.matmul(h, self.W.t()) + self.v_bias)
        return v_prob

2.6 Generative Adversarial Network (GAN)

GAN consists of a generator that creates synthetic data from random noise and a discriminator that distinguishes real from fake samples; they are trained adversarially.

2.7 Autoencoder (AE)

AE learns to compress input data into a low‑dimensional latent space and reconstruct it, using an encoder‑decoder architecture.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def build_autoencoder(input_dim, latent_dim):
    encoder_inputs = keras.Input(shape=(input_dim,))
    encoded = layers.Dense(64, activation='relu')(encoder_inputs)
    encoded = layers.Dense(latent_dim, activation='relu')(encoded)
    decoder_inputs = layers.Dense(latent_dim, activation='relu')(encoded)
    decoded = layers.Dense(64, activation='relu')(decoder_inputs)
    decoded = layers.Dense(input_dim, activation='sigmoid')(decoded)
    autoencoder = keras.Model(encoder_inputs, decoded)
    return autoencoder

(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255.0
x_test = x_test.reshape(-1, 784).astype('float32') / 255.0
autoencoder = build_autoencoder(784, 64)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_train, x_train, epochs=10, batch_size=128, validation_data=(x_test, x_test))

2.8 Deep Residual Network (DRN)

DRN introduces residual connections that add the input of a block to its output, enabling training of very deep networks.

import torch
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.skip_conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride) if stride != 1 or in_channels != out_channels else None

    def forward(self, x):
        identity = x
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        if self.skip_conv:
            identity = self.skip_conv(identity)
        out += identity
        out = self.relu(out)
        return out

class DRN(nn.Module):
    def __init__(self, num_classes):
        super(DRN, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(64, 64, 2)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(256, num_classes)

    def _make_layer(self, in_channels, out_channels, num_blocks, stride=1):
        layers = [ResidualBlock(in_channels, out_channels, stride)]
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.maxpool(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.avgpool(out)
        out = torch.flatten(out, 1)
        out = self.fc(out)
        return out

model = DRN(num_classes=10)
input = torch.randn(1, 3, 224, 224)
output = model(input)

2.9 Capsule Network (CapsNet)

CapsNet replaces traditional convolutional layers with capsule layers that encode pose information and uses dynamic routing to aggregate features.

2.10 Transformer

Transformer relies solely on self‑attention to model dependencies in sequences, consisting of stacked encoder and decoder blocks with multi‑head attention and feed‑forward networks.

2.11 Attention Network (AN)

AN computes attention weights for each element of an input sequence, allowing the model to focus on task‑relevant parts.

2.12 Deep Reinforcement Learning Network (DRLN)

DRLN combines deep neural networks with reinforcement learning, exemplified by Deep Q‑Network (DQN) that approximates Q‑values and uses experience replay.

3. Model Diagrams

The article also provides visual diagrams of the twelve models and mentions additional architectures such as Hopfield Network and Deep Convolutional Inverse Graphics Network.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN Python Deep Learning GaN Transformer neural networks RNN

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.