Predicting Sunspot Activity with CnosDB and a TensorFlow 1DConv‑LSTM Model
This article demonstrates how to store monthly sunspot numbers in the CnosDB time‑series database and use TensorFlow to build a 1DConv‑LSTM neural network for forecasting sunspot activity, covering data import, database insertion, train‑test splitting, model definition, training, and result visualization.
Sunspot numbers, a key indicator of solar activity, exhibit a roughly 11‑year cycle and have shown a recent declining trend, making accurate forecasting important for space weather research.
The monthly mean sunspot number (MSSN) dataset from the SILSO website (1749‑2023) is downloaded as SN_m_tot_V2.0.csv and loaded with pandas:
import pandas as pd df = pd.read_csv("SN_m_tot_V2.0.csv", sep=";", header=None) df.columns = ["year", "month", "date_fraction", "mssn", "standard_deviation", "observations", "marker"] df["year"] = df["year"].astype(str) df["month"] = df["month"].astype(str) df["date"] = df["year"] + "-" + df["month"] print(df.head())
The data are stored in CnosDB, an open‑source distributed time‑series database. After launching CnosDB with Docker, a table sunspot is created via the CLI:
public ❯ CREATE TABLE sunspot ( date STRING, mssn DOUBLE, );
Python interaction uses the CnosDB connector:
# install connector pip install -U cnos-connector from cnosdb_connector import connect conn = connect(url="http://127.0.0.1:31001/", user="root", password="") cursor = conn.cursor() # create database and table conn.create_database("tf_demo") conn.switch_database("tf_demo") cursor.execute("CREATE TABLE sunspot (date STRING, mssn DOUBLE);") # write dataframe to CnosDB conn.write_dataframe(df, "sunspot", ["date", "mssn"])
Data are read back for modeling:
df = pd.read_sql("select * from sunspot;", conn) print(df.head())
The dataset is split into training and testing sets (80/20) and transformed into a sliding‑window format for time‑series learning:
import numpy as np time_index = np.array(df['date']) data = np.array(df['mssn']) SPLIT_RATIO = 0.8 split_index = int(SPLIT_RATIO * data.shape[0]) train_data = data[:split_index] train_time = time_index[:split_index] test_data = data[split_index:] test_time = time_index[split_index:] WINDOW_SIZE = 60 BATCH_SIZE = 32 SHUFFLE_BUFFER = 1000 import tensorflow as tf def ts_data_generator(data, window_size, batch_size, shuffle_buffer): ds = tf.data.Dataset.from_tensor_slices(data) ds = ds.window(window_size + 1, shift=1, drop_remainder=True) ds = ds.flat_map(lambda w: w.batch(window_size + 1)) ds = ds.shuffle(shuffle_buffer).map(lambda w: (w[:-1], w[-1])) ds = ds.batch(batch_size).prefetch(1) return ds tensor_train_data = tf.expand_dims(train_data, axis=-1) tensor_train_dataset = ts_data_generator(tensor_train_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER) tensor_test_data = tf.expand_dims(test_data, axis=-1) tensor_test_dataset = ts_data_generator(tensor_test_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER)
A 1D convolution followed by two LSTM layers and dense output layers constitutes the forecasting model:
model = tf.keras.models.Sequential([ tf.keras.layers.Conv1D(filters=128, kernel_size=3, strides=1, input_shape=[None, 1]), tf.keras.layers.MaxPool1D(pool_size=2, strides=1), tf.keras.layers.LSTM(128, return_sequences=True), tf.keras.layers.LSTM(64, return_sequences=True), tf.keras.layers.Dense(132, activation="relu"), tf.keras.layers.Dense(1) ]) optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) model.compile(loss="mse", optimizer=optimizer, metrics=["mae"]) history = model.fit(tensor_train_dataset, epochs=20, validation_data=tensor_test_dataset)
Training loss and validation loss are plotted to assess convergence:
import matplotlib.pyplot as plt plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='upper left') plt.show()
After training, the model forecasts MSSN values; mean absolute error on the test set is reported (≈24.68):
def model_forecast(model, data, window_size): ds = tf.data.Dataset.from_tensor_slices(data) ds = ds.window(window_size, shift=1, drop_remainder=True) ds = ds.flat_map(lambda w: w.batch(window_size)) ds = ds.batch(32).prefetch(1) forecast = model.predict(ds) return forecast rnn_forecast = model_forecast(model, data[..., np.newaxis], WINDOW_SIZE) rnn_forecast = rnn_forecast[split_index - WINDOW_SIZE:-1, -1, 0] error = tf.keras.metrics.mean_absolute_error(test_data, rnn_forecast).numpy() print(error) # 24.676455
Finally, the predicted series is plotted against the ground‑truth series to visualize performance:
plt.plot(test_data) plt.plot(rnn_forecast) plt.title('MSSN Forecast') plt.ylabel('MSSN') plt.xlabel('Month') plt.legend(['Ground Truth', 'Predictions'], loc='upper right') plt.show()
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.