Overview of Deep Neural Network Architectures
This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.
1. Introduction to Deep Neural Networks
Deep Neural Network (DNN) is a computational model composed of multiple layers of neurons that learn mappings between inputs and outputs through forward propagation, loss calculation, and back‑propagation to adjust weights and biases.
2. Twelve Deep Neural Network Models
2.1 Feedforward Neural Network (FNN)
Also called Multi‑Layer Perceptron, it consists of an input layer, one or more hidden layers, and an output layer. Data flows only forward. Typical training includes weight initialization, forward pass, loss computation, back‑propagation and weight updates.
import tensorflow as tf
# define feedforward neural network
def ffn_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(64, activation='relu', input_shape=(input_size,)))
model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
return model
input_size = 100
output_size = 10
model = ffn_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=10, batch_size=32)
predictions = model.predict(test_data)2.2 Convolutional Neural Network (CNN)
Briefly described earlier; links to introductory tutorials.
2.3 Recurrent Neural Network (RNN)
RNN processes sequential data by maintaining a hidden state that carries information from previous time steps. Training uses back‑propagation through time.
import tensorflow as tf
def rnn_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.SimpleRNN(64, activation='relu', input_shape=(time_steps, input_size)))
model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
return model
input_size = 100
output_size = 10
time_steps = 10
model = rnn_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=10, batch_size=32)
predictions = model.predict(test_data)2.4 Long Short‑Term Memory (LSTM)
LSTM adds memory cells and three gates (input, forget, output) to overcome vanishing gradients in long sequences.
# define LSTM model
def lstm_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(64, activation='tanh', input_shape=(time_steps, input_size)))
model.add(tf.keras.layers.Dense(output_size, activation='softmax'))
return model2.5 Deep Belief Network (DBN)
DBN stacks Restricted Boltzmann Machines (RBM) and is trained layer‑wise (unsupervised pre‑training) followed by supervised fine‑tuning.
# DBN model definition
class DBN(nn.Module):
def __init__(self, input_size, hidden_sizes):
super(DBN, self).__init__()
self.rbm_layers = nn.ModuleList()
for i in range(len(hidden_sizes)):
visible_size = input_size if i == 0 else hidden_sizes[i-1]
rbm_layer = RBM(visible_size, hidden_sizes[i])
self.rbm_layers.append(rbm_layer)
def forward(self, x):
for rbm in self.rbm_layers:
x = rbm(x)
return x
class RBM(nn.Module):
def __init__(self, visible_size, hidden_size):
super(RBM, self).__init__()
self.W = nn.Parameter(torch.randn(visible_size, hidden_size))
self.v_bias = nn.Parameter(torch.randn(visible_size))
self.h_bias = nn.Parameter(torch.randn(hidden_size))
def forward(self, x):
h_prob = torch.sigmoid(torch.matmul(x, self.W) + self.h_bias)
h = torch.bernoulli(h_prob)
v_prob = torch.sigmoid(torch.matmul(h, self.W.t()) + self.v_bias)
return v_prob2.6 Generative Adversarial Network (GAN)
GAN consists of a generator that creates synthetic data from random noise and a discriminator that distinguishes real from fake samples; they are trained adversarially.
2.7 Autoencoder (AE)
AE learns to compress input data into a low‑dimensional latent space and reconstruct it, using an encoder‑decoder architecture.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
def build_autoencoder(input_dim, latent_dim):
encoder_inputs = keras.Input(shape=(input_dim,))
encoded = layers.Dense(64, activation='relu')(encoder_inputs)
encoded = layers.Dense(latent_dim, activation='relu')(encoded)
decoder_inputs = layers.Dense(latent_dim, activation='relu')(encoded)
decoded = layers.Dense(64, activation='relu')(decoder_inputs)
decoded = layers.Dense(input_dim, activation='sigmoid')(decoded)
autoencoder = keras.Model(encoder_inputs, decoded)
return autoencoder
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255.0
x_test = x_test.reshape(-1, 784).astype('float32') / 255.0
autoencoder = build_autoencoder(784, 64)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_train, x_train, epochs=10, batch_size=128, validation_data=(x_test, x_test))2.8 Deep Residual Network (DRN)
DRN introduces residual connections that add the input of a block to its output, enabling training of very deep networks.
import torch
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super(ResidualBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
self.skip_conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride) if stride != 1 or in_channels != out_channels else None
def forward(self, x):
identity = x
out = self.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
if self.skip_conv:
identity = self.skip_conv(identity)
out += identity
out = self.relu(out)
return out
class DRN(nn.Module):
def __init__(self, num_classes):
super(DRN, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(64, 64, 2)
self.layer2 = self._make_layer(64, 128, 2, stride=2)
self.layer3 = self._make_layer(128, 256, 2, stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(256, num_classes)
def _make_layer(self, in_channels, out_channels, num_blocks, stride=1):
layers = [ResidualBlock(in_channels, out_channels, stride)]
for _ in range(1, num_blocks):
layers.append(ResidualBlock(out_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
out = self.relu(self.bn1(self.conv1(x)))
out = self.maxpool(out)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.avgpool(out)
out = torch.flatten(out, 1)
out = self.fc(out)
return out
model = DRN(num_classes=10)
input = torch.randn(1, 3, 224, 224)
output = model(input)2.9 Capsule Network (CapsNet)
CapsNet replaces traditional convolutional layers with capsule layers that encode pose information and uses dynamic routing to aggregate features.
2.10 Transformer
Transformer relies solely on self‑attention to model dependencies in sequences, consisting of stacked encoder and decoder blocks with multi‑head attention and feed‑forward networks.
2.11 Attention Network (AN)
AN computes attention weights for each element of an input sequence, allowing the model to focus on task‑relevant parts.
2.12 Deep Reinforcement Learning Network (DRLN)
DRLN combines deep neural networks with reinforcement learning, exemplified by Deep Q‑Network (DQN) that approximates Q‑values and uses experience replay.
3. Model Diagrams
The article also provides visual diagrams of the twelve models and mentions additional architectures such as Hopfield Network and Deep Convolutional Inverse Graphics Network.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.