Artificial Intelligence 11 min read

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.

Bighead's Algorithm Notes

Aug 26, 2025

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

Background

Stock selection aims to predict prices and identify the most profitable equities, a core challenge in finance. Existing work focuses on model architecture and graph construction, while pre‑training strategies remain under‑explored, often borrowing methods from other domains that ignore stock‑specific characteristics such as non‑stationarity and contextual information.

Problem Definition

Given a set of N stocks, each with M price features over the past ΔT days, the goal is to predict the one‑day return r_{i,t} for each stock on day t and select the stock with the highest expected profit. The model f with parameters w_f predicts returns R_t for all stocks.

Proposed Pre‑training Tasks

Stock Code Classification (SCC) and Sector Classification (SSC) assume that different stocks and sectors exhibit distinguishable patterns. Price sequences are sliced into equal‑length segments, mixed across stocks or sectors, and the model is trained to identify the origin of each slice.

Moving‑Average Prediction (MAP) addresses price volatility and non‑stationarity by predicting the moving average of a masked time window, inspired by traditional moving‑average indicators.

Model Architecture

The authors design a Stock‑Specific Pre‑training Transformer (SSPT) using a standard two‑layer Transformer. Apart from task‑specific classification or prediction heads, no additional structures are added.

Task‑Specific Losses

SCC and SSC use cross‑entropy loss with separate classification heads w_{scc} and w_{ssc}. MAP employs a regression loss for the moving‑average target, with an additional head parameter w_{map}.

Multi‑task Pre‑training

The total pre‑training loss is a weighted sum of the three task losses, with coefficients α, β, γ controlling each task’s influence.

Fine‑tuning for Stock Selection

After pre‑training, the classification heads are replaced with a profit‑rate prediction head. Fine‑tuning optimises a combined loss comprising profit‑rate regression, profit ranking, and a stock‑selection loss, balanced by hyperparameter ε. Model parameters are split into three groups: frozen early layers, fine‑tuned middle layers, and newly initialised prediction head.

Datasets and Evaluation

Five historical stock price datasets from NASDAQ (2013‑2017), NYSE (2013‑2017), FTSE‑100 (2013‑2017), TOPIX‑100 (2016‑2020), and a recent NASDAQ slice (2018‑2022) are used. Each is split chronologically into 3‑year training, 1‑year validation, and 1‑year test sets, with min‑max normalisation applied.

Performance is measured by cumulative investment return (IRR) and Sharpe ratio (SR) under a daily buy‑hold‑sell strategy.

Baseline Comparisons

SSPT is compared against classification (CLF), regression (REG), reinforcement learning (RL), and ranking (RAN) baselines.

Experimental Results

Single‑task analysis shows that using the full price feature set, learning rates of 10^{-3} for SCC/SSC and 10^{-4} for MAP, and specific freezing strategies (no freezing for SCC/MAP, freezing embeddings for SSC) yield the best results.

Combined‑task analysis indicates that SCC+SSC consistently improve selection, while adding MAP is sensitive to loss‑weight settings; a balanced α=β=1 without MAP is the most robust configuration.

Comparison with existing methods shows SSPT achieving higher IRR and SR across all datasets, confirming that stock‑specific pre‑training extracts more useful knowledge from price sequences.

Simulation study using Wiener‑process generated synthetic series demonstrates that SCC and SSC rely on distinct statistical properties of the raw sequences, supporting the hypothesis that these tasks provide valuable features for downstream stock selection.

Conclusion

The SSPT framework, with its three tailored pre‑training tasks and a simple two‑layer Transformer, consistently outperforms market baselines and existing methods in both cumulative return and Sharpe ratio, demonstrating the effectiveness of stock‑specific pre‑training for improving stock selection performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Transformer pretraining time series financial AI multitask learning stock selection

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.