Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras
This article introduces recurrent neural networks (RNNs) and their ability to handle sequential data, explains the limitations of vanilla RNNs, presents the LSTM architecture with its gates, and provides complete Keras code for data loading, model building, and training both vanilla RNN and LSTM models.
Recurrent Neural Networks
Feed‑forward networks such as MLPs and CNNs are powerful but cannot process sequential data because they lack memory of previous inputs; for tasks like language translation, context is required to predict the next word.
Vanilla RNN
Vanilla RNNs have a simple recurrent structure but suffer from the long‑term dependency problem, so they cannot retain memory over long sequences.
LSTM
LSTM (Long‑Short‑Term Memory) is an improved recurrent architecture that solves the long‑term dependency issue. It replaces the standard recurrent layer with LSTM cells composed of an input gate, a forget gate, and an output gate. Below is a diagram of an LSTM cell:
Loading Libraries
<code>import numpy as np
from sklearn.metrics import accuracy_score
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical</code>Loading Data and Splitting
<code># parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=num_words, maxlen=maxlen, test_split=test_split)</code> <code># pad the sequences with zeros
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding='post')
X_test = pad_sequences(X_test, padding='post')
X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))
y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)
y_train = y_data[:1395]
y_test = y_data[1395:]
</code>Loading Model
<code>from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, Activation
from tensorflow.keras import optimizers
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier</code>Vanilla RNN
<code>def vanilla_rnn():
model = Sequential()
model.add(SimpleRNN(50, input_shape=(49,1), return_sequences=False))
model.add(Dense(46))
model.add(Activation('softmax'))
adam = optimizers.Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
return model
</code>Model Training
<code>model = KerasClassifier(build_fn=vanilla_rnn, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)
</code>LSTM
<code>from tensorflow.keras.layers import LSTM
def lstm():
model = Sequential()
model.add(LSTM(50, input_shape=(49,1), return_sequences=False))
model.add(Dense(46))
model.add(Activation('softmax'))
adam = optimizers.Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
return model
</code>Model Training
<code>model = KerasClassifier(build_fn=lstm, epochs=200, batch_size=50, verbose=1)
model.fit(X_train, y_train)
</code>Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.