Predicting Sunspot Activity with CnosDB and a TensorFlow 1DConv‑LSTM Model
This article demonstrates how to store monthly sunspot numbers in the CnosDB time‑series database and use TensorFlow to build a 1DConv‑LSTM neural network for forecasting sunspot activity, covering data import, database insertion, train‑test splitting, model definition, training, and result visualization.
Sunspot numbers, a key indicator of solar activity, exhibit a roughly 11‑year cycle and have shown a recent declining trend, making accurate forecasting important for space weather research.
The monthly mean sunspot number (MSSN) dataset from the SILSO website (1749‑2023) is downloaded as SN_m_tot_V2.0.csv and loaded with pandas:
import pandas as pd
df = pd.read_csv("SN_m_tot_V2.0.csv", sep=";", header=None)
df.columns = ["year", "month", "date_fraction", "mssn", "standard_deviation", "observations", "marker"]
df["year"] = df["year"].astype(str)
df["month"] = df["month"].astype(str)
df["date"] = df["year"] + "-" + df["month"]
print(df.head())The data are stored in CnosDB, an open‑source distributed time‑series database. After launching CnosDB with Docker, a table sunspot is created via the CLI:
public ❯ CREATE TABLE sunspot (
date STRING,
mssn DOUBLE,
);Python interaction uses the CnosDB connector:
# install connector
pip install -U cnos-connector
from cnosdb_connector import connect
conn = connect(url="http://127.0.0.1:31001/", user="root", password="")
cursor = conn.cursor()
# create database and table
conn.create_database("tf_demo")
conn.switch_database("tf_demo")
cursor.execute("CREATE TABLE sunspot (date STRING, mssn DOUBLE);")
# write dataframe to CnosDB
conn.write_dataframe(df, "sunspot", ["date", "mssn"])Data are read back for modeling:
df = pd.read_sql("select * from sunspot;", conn)
print(df.head())The dataset is split into training and testing sets (80/20) and transformed into a sliding‑window format for time‑series learning:
import numpy as np
time_index = np.array(df['date'])
data = np.array(df['mssn'])
SPLIT_RATIO = 0.8
split_index = int(SPLIT_RATIO * data.shape[0])
train_data = data[:split_index]
train_time = time_index[:split_index]
test_data = data[split_index:]
test_time = time_index[split_index:]
WINDOW_SIZE = 60
BATCH_SIZE = 32
SHUFFLE_BUFFER = 1000
import tensorflow as tf
def ts_data_generator(data, window_size, batch_size, shuffle_buffer):
ds = tf.data.Dataset.from_tensor_slices(data)
ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda w: w.batch(window_size + 1))
ds = ds.shuffle(shuffle_buffer).map(lambda w: (w[:-1], w[-1]))
ds = ds.batch(batch_size).prefetch(1)
return ds
tensor_train_data = tf.expand_dims(train_data, axis=-1)
tensor_train_dataset = ts_data_generator(tensor_train_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER)
tensor_test_data = tf.expand_dims(test_data, axis=-1)
tensor_test_dataset = ts_data_generator(tensor_test_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER)A 1D convolution followed by two LSTM layers and dense output layers constitutes the forecasting model:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=128, kernel_size=3, strides=1, input_shape=[None, 1]),
tf.keras.layers.MaxPool1D(pool_size=2, strides=1),
tf.keras.layers.LSTM(128, return_sequences=True),
tf.keras.layers.LSTM(64, return_sequences=True),
tf.keras.layers.Dense(132, activation="relu"),
tf.keras.layers.Dense(1)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(loss="mse", optimizer=optimizer, metrics=["mae"])
history = model.fit(tensor_train_dataset, epochs=20, validation_data=tensor_test_dataset)Training loss and validation loss are plotted to assess convergence:
import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()After training, the model forecasts MSSN values; mean absolute error on the test set is reported (≈24.68):
def model_forecast(model, data, window_size):
ds = tf.data.Dataset.from_tensor_slices(data)
ds = ds.window(window_size, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda w: w.batch(window_size))
ds = ds.batch(32).prefetch(1)
forecast = model.predict(ds)
return forecast
rnn_forecast = model_forecast(model, data[..., np.newaxis], WINDOW_SIZE)
rnn_forecast = rnn_forecast[split_index - WINDOW_SIZE:-1, -1, 0]
error = tf.keras.metrics.mean_absolute_error(test_data, rnn_forecast).numpy()
print(error) # 24.676455Finally, the predicted series is plotted against the ground‑truth series to visualize performance:
plt.plot(test_data)
plt.plot(rnn_forecast)
plt.title('MSSN Forecast')
plt.ylabel('MSSN')
plt.xlabel('Month')
plt.legend(['Ground Truth', 'Predictions'], loc='upper right')
plt.show()Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
