Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras

This article introduces recurrent neural networks (RNNs) and their ability to handle sequential data, explains the limitations of vanilla RNNs, presents the LSTM architecture with its gates, and provides complete Keras code for data loading, model building, and training both vanilla RNN and LSTM models.

Model Perspective
Model Perspective
Model Perspective
Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras

Recurrent Neural Networks

Feed‑forward networks such as MLPs and CNNs are powerful but cannot process sequential data because they lack memory of previous inputs; for tasks like language translation, context is required to predict the next word.

Vanilla RNN

Vanilla RNNs have a simple recurrent structure but suffer from the long‑term dependency problem, so they cannot retain memory over long sequences.

LSTM

LSTM (Long‑Short‑Term Memory) is an improved recurrent architecture that solves the long‑term dependency issue. It replaces the standard recurrent layer with LSTM cells composed of an input gate, a forget gate, and an output gate. Below is a diagram of an LSTM cell:

Loading Libraries

import numpy as np
from sklearn.metrics import accuracy_score
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

Loading Data and Splitting

# parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=num_words, maxlen=maxlen, test_split=test_split)
# pad the sequences with zeros
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding='post')
X_test = pad_sequences(X_test, padding='post')

X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))

y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)

y_train = y_data[:1395]
y_test = y_data[1395:]

Loading Model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, Activation
from tensorflow.keras import optimizers
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

Vanilla RNN

def vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape=(49,1), return_sequences=False))
    model.add(Dense(46))
    model.add(Activation('softmax'))

    adam = optimizers.Adam(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

    return model

Model Training

model = KerasClassifier(build_fn=vanilla_rnn, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)

LSTM

from tensorflow.keras.layers import LSTM

def lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape=(49,1), return_sequences=False))
    model.add(Dense(46))
    model.add(Activation('softmax'))

    adam = optimizers.Adam(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

    return model

Model Training

model = KerasClassifier(build_fn=lstm, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep LearningKerasLSTMRNNSequence Modeling
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.