End‑to‑End Time Series Forecasting with LSTM in Python
This tutorial walks through loading Google stock data, preprocessing it with scaling, constructing past‑window features, building and tuning an LSTM model using GridSearchCV, evaluating predictions, and finally forecasting future values, all illustrated with complete Python code.
In many practical scenarios we need to forecast a target series, such as brand sales or product demand. This article demonstrates a complete end‑to‑end workflow for time‑series prediction using an LSTM network in Python.
Data loading and inspection
We read a CSV file containing Google stock data from 2001‑01‑25 to 2021‑09‑29, parse the Date column, and set the first column as the index.
<code>import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
df = pd.read_csv("train.csv", parse_dates=["Date"], index_col=[0])
print(df.shape) # (5203, 5)
</code>We aim to predict the Open column; therefore it is the target variable.
Train‑test split
Because time series must remain ordered, we split the data without shuffling: 80% for training and the last 20% for testing.
<code>test_split = round(len(df) * 0.20)
df_for_training = df[:-1041]
df_for_testing = df[-1041:]
print(df_for_training.shape) # (4162, 5)
print(df_for_testing.shape) # (1041, 5)
</code>Scaling
We apply a MinMaxScaler to bring all features into the range (0, 1).
<code>scaler = MinMaxScaler(feature_range=(0, 1))
df_for_training_scaled = scaler.fit_transform(df_for_training)
df_for_testing_scaled = scaler.transform(df_for_testing)
</code>Creating X and Y sequences
Using a sliding window of n_past = 30 time steps, we build input arrays trainX and testX that contain the previous 30 rows of all five features, and target arrays trainY and testY that contain the corresponding Open value at the next time step.
<code>def createXY(dataset, n_past):
dataX, dataY = [], []
for i in range(n_past, len(dataset)):
dataX.append(dataset[i - n_past:i, 0:dataset.shape[1]])
dataY.append(dataset[i, 0])
return np.array(dataX), np.array(dataY)
trainX, trainY = createXY(df_for_training_scaled, 30)
testX, testY = createXY(df_for_testing_scaled, 30)
print("trainX Shape--", trainX.shape) # (4132, 30, 5)
print("trainY Shape--", trainY.shape) # (4132,)
</code>Model definition and hyper‑parameter search
We define a function that builds a Sequential LSTM model with two LSTM layers, a dropout layer, and a dense output. GridSearchCV searches over batch size, epochs, and optimizer.
<code>def build_model(optimizer):
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(30, 5)))
model.add(LSTM(50))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer=optimizer)
return model
grid_model = KerasRegressor(build_fn=build_model, verbose=1, validation_data=(testX, testY))
parameters = {
'batch_size': [16, 20],
'epochs': [8, 10],
'optimizer': ['adam', 'Adadelta']
}
grid_search = GridSearchCV(estimator=grid_model, param_grid=parameters, cv=2)
grid_search = grid_search.fit(trainX, trainY)
print(grid_search.best_params_) # {'batch_size': 20, 'epochs': 10, 'optimizer': 'adam'}
</code>Training the best model
The best estimator is extracted and stored as my_model .
<code>my_model = grid_search.best_estimator_.model
</code>Evaluation on the test set
We predict on testX , then inverse‑transform the scaled predictions back to the original price scale. Because the scaler expects five columns, we repeat the single‑column predictions five times before applying inverse_transform .
<code>prediction = my_model.predict(testX)
prediction_copies = np.repeat(prediction, 5, axis=-1)
pred = scaler.inverse_transform(prediction_copies)[:, 0]
original_copies = np.repeat(testY, 5, axis=-1)
original = scaler.inverse_transform(original_copies.reshape(len(testY), 5))[:, 0]
</code>We plot the real and predicted stock prices.
<code>plt.plot(original, color='red', label='Real Stock Price')
plt.plot(pred, color='blue', label='Predicted Stock Price')
plt.title('Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()
</code>Forecasting future values
To predict the next 30 days, we take the last 30 rows of the original data, append a placeholder Open column of zeros for the future period, scale the combined dataset, and iteratively feed a sliding window into the trained model, replacing each NaN with the newly predicted value.
<code># Load the last 30 days
past_30 = df.iloc[-30:,:]
# Load future feature values (without Open)
future_features = pd.read_csv("test.csv", parse_dates=["Date"], index_col=[0])
future_features["Open"] = 0
future_features = future_features[["Open","High","Low","Close","Adj Close"]]
# Scale both parts
old_scaled = scaler.transform(past_30)
new_scaled = scaler.transform(future_features)
new_scaled_df = pd.DataFrame(new_scaled)
new_scaled_df.iloc[:,0] = np.nan
full_df = pd.concat([pd.DataFrame(old_scaled), new_scaled_df]).reset_index(drop=True)
# Iterative prediction
all_preds = []
for i in range(30, len(full_df)):
x_input = full_df.iloc[i-30:i, :].values.reshape(1,30,5)
pred = my_model.predict(x_input)
all_preds.append(pred)
full_df.iloc[i,0] = pred
# Inverse transform the future predictions
future_pred = np.array(all_preds).reshape(-1,1)
future_pred_copies = np.repeat(future_pred,5,axis=-1)
y_pred_future = scaler.inverse_transform(future_pred_copies)[:,0]
print(y_pred_future)
</code>The script outputs a list of predicted Open prices for the next 30 days, completing the end‑to‑end forecasting pipeline.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.