Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras
This article introduces recurrent neural networks (RNNs) and their ability to handle sequential data, explains the limitations of vanilla RNNs, presents the LSTM architecture with its gates, and provides complete Keras code for data loading, model building, and training both vanilla RNN and LSTM models.
Recurrent Neural Networks
Feed‑forward networks such as MLPs and CNNs are powerful but cannot process sequential data because they lack memory of previous inputs; for tasks like language translation, context is required to predict the next word.
Vanilla RNN
Vanilla RNNs have a simple recurrent structure but suffer from the long‑term dependency problem, so they cannot retain memory over long sequences.
LSTM
LSTM (Long‑Short‑Term Memory) is an improved recurrent architecture that solves the long‑term dependency issue. It replaces the standard recurrent layer with LSTM cells composed of an input gate, a forget gate, and an output gate. Below is a diagram of an LSTM cell:
Loading Libraries
import numpy as np
from sklearn.metrics import accuracy_score
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categoricalLoading Data and Splitting
# parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=num_words, maxlen=maxlen, test_split=test_split) # pad the sequences with zeros
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding='post')
X_test = pad_sequences(X_test, padding='post')
X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))
y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)
y_train = y_data[:1395]
y_test = y_data[1395:]Loading Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, Activation
from tensorflow.keras import optimizers
from tensorflow.keras.wrappers.scikit_learn import KerasClassifierVanilla RNN
def vanilla_rnn():
model = Sequential()
model.add(SimpleRNN(50, input_shape=(49,1), return_sequences=False))
model.add(Dense(46))
model.add(Activation('softmax'))
adam = optimizers.Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
return modelModel Training
model = KerasClassifier(build_fn=vanilla_rnn, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)LSTM
from tensorflow.keras.layers import LSTM
def lstm():
model = Sequential()
model.add(LSTM(50, input_shape=(49,1), return_sequences=False))
model.add(Dense(46))
model.add(Activation('softmax'))
adam = optimizers.Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
return modelModel Training
model = KerasClassifier(build_fn=lstm, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
