Artificial Intelligence 17 min read

STORM: A Bidirectional Spatiotemporal Factor Model Achieving Sharpe Ratio >1

STORM introduces a bidirectional VQ‑VAE‑based spatiotemporal factor model that extracts fine‑grained time‑series and cross‑sectional features, uses discrete codebooks for orthogonal, diverse factor embeddings, and outperforms nine baselines on portfolio management and algorithmic trading tasks, delivering Sharpe ratios exceeding 1.

Bighead's Algorithm Notes

Apr 6, 2026

STORM: A Bidirectional Spatiotemporal Factor Model Achieving Sharpe Ratio >1

Background

Factor models are fundamental for asset pricing and capturing excess returns. Existing VAE‑based latent factor models suffer from three limitations: (1) inability to capture fine‑grained temporal patterns of individual stocks, (2) single‑valued factor representations that lack diversity, and (3) weak factor‑selection mechanisms, which lead to low factor quality and poor robustness.

Problem Definition

The goal is to design a factor model that overcomes these shortcomings, improves factor quality, and adapts to varying market conditions.

Method

STORM introduces a dual‑vector‑quantized VAE (VQ‑VAE) architecture that extracts stock features from temporal (TS) and cross‑sectional (CS) perspectives.

Data Chunking : In the TS module, data are split along the stock dimension so each chunk contains one stock’s data over p days. In the CS module, data are split along the time axis so each chunk contains all stocks’ features for a single trading day.

TS and CS Encoders : Stacked Transformer blocks (4 layers for TS, 2 layers for CS) encode the chunks, capturing local and global dependencies while keeping the model lightweight (4 and 8 attention heads respectively).

Codebook Construction & Optimization : A learnable embedding space serves as the codebook. Continuous latent vectors are quantized to the nearest codebook entry by minimizing Euclidean distance. The discrete codebook provides two benefits: (a) it clusters similar factors, guaranteeing orthogonality and diversity, and (b) it enables explicit factor selection via token indices.

Diversity & Orthogonality Losses : A diversity loss L_{div} encourages balanced usage of the G=2 codebooks (each of size K). An orthogonality loss L_{ortho} enforces the K\times K identity matrix on the L2‑normalized embeddings, reducing multi‑collinearity.

TS and CS Decoders : Transformer decoders reconstruct the original data from the quantized latent vectors z_{ts}^q(x) and z_{cs}^q(x), producing x_{ts}' and x_{cs}'. The overall loss combines reconstruction, codebook, diversity, and orthogonality terms, with a stop‑gradient operator sg applied where needed.

Factor Module

Cross‑attention fuses TS and CS embeddings, emphasizing mutually informative regions and suppressing noise.

Contrastive loss pulls together embeddings from similar market conditions and pushes apart those from different conditions, improving consistency across scenarios.

Prior‑posterior learning attaches the codebook embeddings as additional [CLS] tokens to the latent features, forming a combined factor embedding z_e(x). During training, a posterior layer estimates the expected return distribution z_{post} conditioned on future returns y. At inference, a prior layer predicts z_{prior} without future information.

A return predictor maps the expected factor returns to future stock returns, which can be fed directly into portfolio optimization or algorithmic‑trading policies.

Downstream Tasks

STORM is evaluated on two downstream tasks: (1) Portfolio Management (PM) using a TopK‑Drop daily portfolio construction with weekly turnover constraints, and (2) Algorithmic Trading (AT) where the factor embedding is incorporated into a PPO‑based reinforcement‑learning agent that interacts with the market environment.

Experiments

Datasets : US equity markets SP500 and DJ30, covering 16 years (2008‑04‑01 to 2024‑03‑31) with daily Alpha158 features. Training set: 2008‑04‑01 to 2021‑03‑31; test set: 2021‑04‑01 to 2024‑03‑31.

Metrics : Six financial metrics – annualized percentage yield (APY), cumulative wealth (CW), Calmar ratio (CR), annualized Sharpe ratio (ASR), maximum drawdown (MDD), and annualized volatility (AVO). For PM, RankIC and RankICIR assess factor effectiveness.

Baselines : Nine methods spanning machine‑learning (LGBM, LSTM, Transformer), factor‑model (CAFactor, FactorVAE, HireVAE), and reinforcement‑learning (SAC, PPO, DQN) families.

Results

Portfolio Performance : STORM improves average APY by 116.36% over the best baseline in PM and by 43.64% in AT. Although STORM’s MDD and AVO on INTC and MSFT do not surpass the best baselines (attributed to extreme COVID‑19 volatility), it achieves the highest ASR (+123.94%) and CR (+245.45%) in PM and leads in AT across three stocks.

Factor Quality : On both datasets, STORM raises RankIC by 14.69% (SP500) and 15.36% (DJ30) relative to six competing methods, indicating more consistent and generalizable signals. Codebook usage analysis shows uniform selection frequencies, confirming maintained diversity. A codebook size of 512 yields the best trade‑off between expressive power and efficiency.

Ablation Study : Removing either the TS or CS branch (STORM‑w/o‑TS, STORM‑w/o‑CS) degrades performance markedly. The full model outperforms the simplified versions by 16.50% (RankIC) and 23.79% (RankICIR) on return‑prediction tasks, and achieves superior APY while keeping volatility low.

Performance Discussion

Efficiency : The model runs on standard CPUs with only a few Transformer layers and ~15.2 M FP32 parameters, enabling real‑time deployment on edge devices without specialized hardware.

Effectiveness : Across PM and AT, STORM consistently beats baselines that focus on a single domain, demonstrating balanced return and risk optimization.

Generalization : STORM performs well on both the broad, diversified SP500 and the industrial‑heavy DJ30, and its modular encoder‑decoder design can incorporate additional layers (e.g., Mamba) for other asset classes.

Reproducibility

DOI: https://doi.org/10.1145/3773966.3777972

Source code: https://github.com/DVampire/Storm

Transformer VQ-VAE portfolio management quantitative finance Algorithmic Trading discrete codebook factor model

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.