Artificial Intelligence 15 min read

How ReVol’s Return‑Volatility Normalization Reduces Distribution Shift in Stock Price Prediction

The paper introduces ReVol, a three‑stage framework that normalizes price features, uses an attention‑based estimator to recover return and volatility, and denormalizes predictions, demonstrating consistent improvements of over 0.03 in IC and 0.7 in Sharpe ratio across multiple time‑series models.

Bighead's Algorithm Notes

Feb 13, 2026

How ReVol’s Return‑Volatility Normalization Reduces Distribution Shift in Stock Price Prediction

Background Distribution shift is a fundamental challenge in data mining and machine learning, especially for stock‑price forecasting where market conditions change over time and across assets, causing mismatched statistical properties such as returns and volatility.

Problem definition The authors identify three issues: (1) existing methods only align mean and variance, ignoring distribution shape; (2) simple arithmetic averaging fails to estimate true returns and volatility under market shocks; (3) removing sample‑specific features breaks accurate prediction because the model cannot reconstruct the original distribution.

Method ReVol consists of three core modules:

Return‑Volatility Normalization (RVN) : Inspired by geometric Brownian motion, historical open, high, low, and close prices \(S_{o_{T-w+1}:T},\dots,S_{c_{T-w+1}:T}\) are normalized to error terms \(\epsilon_{o_{T-w+1}:T},\dots,\epsilon_{c_{T-w+1}:T}\), removing sample‑specific characteristics while keeping a shared error distribution.

Return‑Volatility Estimator (RVE) : An attention‑based module \(RVE_{\phi}\) estimates returns and volatility. Each price feature vector \(x_t\) is transformed by a fully‑connected layer with tanh activation, fed into an LSTM to obtain hidden states \(h_t\), and attention weights \(\alpha_t\) are computed as \(\alpha_t = \frac{\exp(w^\top h_t)}{\sum_{k}\exp(w^\top h_k)}\). The weighted average yields robust estimates that suppress outliers caused by external shocks.

Backbone network and Return‑Volatility Denormalization (RVD) : The backbone \(f_{\theta}\) (any time‑series predictor such as LSTM, GRU, Transformer, etc.) receives the normalized error terms and predicts future error terms. The RVD module then denormalizes these predictions using the estimated returns and volatility to obtain the final closing‑price forecast.

Loss function The objective jointly optimizes the backbone parameters \(\theta\) and the RVE parameters \(\phi\):

L = \frac{1}{N}\sum_{i}\|\hat{r}_i - r_i\|^2 + \beta\,\|\hat{\mu}_i - \mu_i^{\text{log}}\|^2

where the first term is the mean‑squared error of daily return predictions and the second term is a guidance loss that forces the estimated returns to be close to the arithmetic mean of log‑returns, with \(\beta\) controlling its influence.

Experimental setup Datasets from the US, China, UK, and South Korea markets are split chronologically (70% train, 10% validation, 20% test). Baselines include price‑ratio normalization, LSTM, GRU, ALSTM, Vanilla Transformer, DTML, and MASTER. Evaluation metrics are Information Coefficient (IC), RankIC, and annualized Sharpe Ratio (SR), averaged over 10 random seeds.

Results ReVol consistently improves all baselines: IC increases by >0.03 and SR by >0.7 on average. Compared with other normalization schemes, ReVol better aligns the shape of training and test distributions, preserving feature relationships and yielding more stable predictions. Hyper‑parameter sensitivity tests show that ReVol’s performance remains robust across window sizes and weight‑decay settings, unlike DTML which exhibits large IC variance. Attention analysis reveals that time steps with high return magnitude receive lower attention weights, indicating implicit noise suppression.

Ablation study Removing any of the three modules degrades performance; the RVN component contributes the most because it directly addresses distribution shift by normalizing sample features.

Overall, ReVol demonstrates that mitigating distribution shift through return‑volatility normalization and attention‑based estimation substantially enhances the accuracy and profitability of stock‑price forecasting models.

deep learning time series forecasting distribution shift financial AI stock price prediction attention estimator return volatility normalization

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.