Predicting Sunspot Activity with CnosDB and a TensorFlow 1DConv‑LSTM Model

This article demonstrates how to store monthly sunspot numbers in the CnosDB time‑series database and use TensorFlow to build a 1DConv‑LSTM neural network for forecasting sunspot activity, covering data import, database insertion, train‑test splitting, model definition, training, and result visualization.

DataFunTalk
DataFunTalk
DataFunTalk
Predicting Sunspot Activity with CnosDB and a TensorFlow 1DConv‑LSTM Model

Sunspot numbers, a key indicator of solar activity, exhibit a roughly 11‑year cycle and have shown a recent declining trend, making accurate forecasting important for space weather research.

The monthly mean sunspot number (MSSN) dataset from the SILSO website (1749‑2023) is downloaded as SN_m_tot_V2.0.csv and loaded with pandas:

import pandas as pd

df = pd.read_csv("SN_m_tot_V2.0.csv", sep=";", header=None)

df.columns = ["year", "month", "date_fraction", "mssn", "standard_deviation", "observations", "marker"]

df["year"] = df["year"].astype(str)

df["month"] = df["month"].astype(str)

df["date"] = df["year"] + "-" + df["month"]

print(df.head())

The data are stored in CnosDB, an open‑source distributed time‑series database. After launching CnosDB with Docker, a table sunspot is created via the CLI:

public ❯ CREATE TABLE sunspot (
    date STRING,
    mssn DOUBLE,
);

Python interaction uses the CnosDB connector:

# install connector
pip install -U cnos-connector

from cnosdb_connector import connect

conn = connect(url="http://127.0.0.1:31001/", user="root", password="")
cursor = conn.cursor()

# create database and table
conn.create_database("tf_demo")
conn.switch_database("tf_demo")
cursor.execute("CREATE TABLE sunspot (date STRING, mssn DOUBLE);")

# write dataframe to CnosDB
conn.write_dataframe(df, "sunspot", ["date", "mssn"])

Data are read back for modeling:

df = pd.read_sql("select * from sunspot;", conn)
print(df.head())

The dataset is split into training and testing sets (80/20) and transformed into a sliding‑window format for time‑series learning:

import numpy as np

time_index = np.array(df['date'])
data = np.array(df['mssn'])

SPLIT_RATIO = 0.8
split_index = int(SPLIT_RATIO * data.shape[0])
train_data = data[:split_index]
train_time = time_index[:split_index]
test_data = data[split_index:]
test_time = time_index[split_index:]

WINDOW_SIZE = 60
BATCH_SIZE = 32
SHUFFLE_BUFFER = 1000

import tensorflow as tf

def ts_data_generator(data, window_size, batch_size, shuffle_buffer):
    ds = tf.data.Dataset.from_tensor_slices(data)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer).map(lambda w: (w[:-1], w[-1]))
    ds = ds.batch(batch_size).prefetch(1)
    return ds

tensor_train_data = tf.expand_dims(train_data, axis=-1)

tensor_train_dataset = ts_data_generator(tensor_train_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER)

tensor_test_data = tf.expand_dims(test_data, axis=-1)

tensor_test_dataset = ts_data_generator(tensor_test_data, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER)

A 1D convolution followed by two LSTM layers and dense output layers constitutes the forecasting model:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv1D(filters=128, kernel_size=3, strides=1, input_shape=[None, 1]),
    tf.keras.layers.MaxPool1D(pool_size=2, strides=1),
    tf.keras.layers.LSTM(128, return_sequences=True),
    tf.keras.layers.LSTM(64, return_sequences=True),
    tf.keras.layers.Dense(132, activation="relu"),
    tf.keras.layers.Dense(1)
])

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(loss="mse", optimizer=optimizer, metrics=["mae"])

history = model.fit(tensor_train_dataset, epochs=20, validation_data=tensor_test_dataset)

Training loss and validation loss are plotted to assess convergence:

import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

After training, the model forecasts MSSN values; mean absolute error on the test set is reported (≈24.68):

def model_forecast(model, data, window_size):
    ds = tf.data.Dataset.from_tensor_slices(data)
    ds = ds.window(window_size, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size))
    ds = ds.batch(32).prefetch(1)
    forecast = model.predict(ds)
    return forecast

rnn_forecast = model_forecast(model, data[..., np.newaxis], WINDOW_SIZE)
rnn_forecast = rnn_forecast[split_index - WINDOW_SIZE:-1, -1, 0]
error = tf.keras.metrics.mean_absolute_error(test_data, rnn_forecast).numpy()
print(error)  # 24.676455

Finally, the predicted series is plotted against the ground‑truth series to visualize performance:

plt.plot(test_data)
plt.plot(rnn_forecast)
plt.title('MSSN Forecast')
plt.ylabel('MSSN')
plt.xlabel('Month')
plt.legend(['Ground Truth', 'Predictions'], loc='upper right')
plt.show()
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonTensorFlow1DConv LSTMCnosDBsunspot prediction
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.