SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance
This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.
Background
Stock selection aims to predict prices and identify the most profitable equities, a core challenge in finance. Existing work focuses on model architecture and graph construction, while pre‑training strategies remain under‑explored, often borrowing methods from other domains that ignore stock‑specific characteristics such as non‑stationarity and contextual information.
Problem Definition
Given a set of N stocks, each with M price features over the past ΔT days, the goal is to predict the one‑day return r_{i,t} for each stock on day t and select the stock with the highest expected profit. The model f with parameters w_f predicts returns R_t for all stocks.
Proposed Pre‑training Tasks
Stock Code Classification (SCC) and Sector Classification (SSC) assume that different stocks and sectors exhibit distinguishable patterns. Price sequences are sliced into equal‑length segments, mixed across stocks or sectors, and the model is trained to identify the origin of each slice.
Moving‑Average Prediction (MAP) addresses price volatility and non‑stationarity by predicting the moving average of a masked time window, inspired by traditional moving‑average indicators.
Model Architecture
The authors design a Stock‑Specific Pre‑training Transformer (SSPT) using a standard two‑layer Transformer. Apart from task‑specific classification or prediction heads, no additional structures are added.
Task‑Specific Losses
SCC and SSC use cross‑entropy loss with separate classification heads w_{scc} and w_{ssc}. MAP employs a regression loss for the moving‑average target, with an additional head parameter w_{map}.
Multi‑task Pre‑training
The total pre‑training loss is a weighted sum of the three task losses, with coefficients α, β, γ controlling each task’s influence.
Fine‑tuning for Stock Selection
After pre‑training, the classification heads are replaced with a profit‑rate prediction head. Fine‑tuning optimises a combined loss comprising profit‑rate regression, profit ranking, and a stock‑selection loss, balanced by hyperparameter ε. Model parameters are split into three groups: frozen early layers, fine‑tuned middle layers, and newly initialised prediction head.
Datasets and Evaluation
Five historical stock price datasets from NASDAQ (2013‑2017), NYSE (2013‑2017), FTSE‑100 (2013‑2017), TOPIX‑100 (2016‑2020), and a recent NASDAQ slice (2018‑2022) are used. Each is split chronologically into 3‑year training, 1‑year validation, and 1‑year test sets, with min‑max normalisation applied.
Performance is measured by cumulative investment return (IRR) and Sharpe ratio (SR) under a daily buy‑hold‑sell strategy.
Baseline Comparisons
SSPT is compared against classification (CLF), regression (REG), reinforcement learning (RL), and ranking (RAN) baselines.
Experimental Results
Single‑task analysis shows that using the full price feature set, learning rates of 10^{-3} for SCC/SSC and 10^{-4} for MAP, and specific freezing strategies (no freezing for SCC/MAP, freezing embeddings for SSC) yield the best results.
Combined‑task analysis indicates that SCC+SSC consistently improve selection, while adding MAP is sensitive to loss‑weight settings; a balanced α=β=1 without MAP is the most robust configuration.
Comparison with existing methods shows SSPT achieving higher IRR and SR across all datasets, confirming that stock‑specific pre‑training extracts more useful knowledge from price sequences.
Simulation study using Wiener‑process generated synthetic series demonstrates that SCC and SSC rely on distinct statistical properties of the raw sequences, supporting the hypothesis that these tasks provide valuable features for downstream stock selection.
Conclusion
The SSPT framework, with its three tailored pre‑training tasks and a simple two‑layer Transformer, consistently outperforms market baselines and existing methods in both cumulative return and Sharpe ratio, demonstrating the effectiveness of stock‑specific pre‑training for improving stock selection performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
