STEAM: Wavelet‑Enhanced Attention Model for Stock Price Prediction

The STEAM model combines discrete wavelet transform, a wavelet‑enhanced attention mechanism, and a market‑index prefix within a Mamba‑2 encoder to capture multi‑frequency spatial and temporal dependencies in stock data, achieving state‑of‑the‑art performance across multiple international markets as measured by IC, PnL and Sharpe ratios.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
STEAM: Wavelet‑Enhanced Attention Model for Stock Price Prediction

Background

Accurate stock price prediction is crucial for investment decisions, risk reduction, and market efficiency. While deep learning has shown strong potential, the non‑stationarity of financial time series and the influence of multiple factors make direct modeling difficult. Existing works use frequency‑domain analysis to separate high‑ and low‑frequency patterns but focus on single‑sequence decomposition, ignoring cross‑stock frequency interactions and the informative role of market indices.

Problem Definition

The task is framed as a stock ranking problem. Given N stocks, each represented by a T‑step feature matrix X_i (F features per step), the goal is to predict the future u‑day return r_i for each stock. Market index data I (C features) of length T is also available.

Method

Overall Architecture

STEAM adopts an encoder‑only design. Input sequences are first embedded via an MLP to obtain representations H, which are processed by E stacked encoder layers. Each layer contains an AMamba module (integrating the proposed Wavelet‑Enhanced Attention, WEA) and a feed‑forward network, with residual connections.

Spatial Dependency Modeling

Discrete Wavelet Transform (DWT) decomposes each time series into multi‑frequency components, reducing non‑stationarity. For a series X at scale m, approximation coefficients c_A^m capture long‑term trends and detail coefficients c_D^m capture short‑term variations. Circular padding preserves the original length. A learnable multi‑frequency wavelet transform (MFWT) treats wavelet bases as parameters, enabling end‑to‑end training.

Wavelet‑Enhanced Attention (WEA)

Different frequency representations exhibit distinct spatial dependencies: low‑frequency tokens capture global relations, high‑frequency tokens capture local changes. WEA normalizes hidden states, applies MFWT along the time dimension, reshapes and concatenates frequency tokens, then performs a two‑step relay attention. Queries, keys, and values are derived from the combined representation, a relay token is computed via linear projection with compression ratio ρ, and a second attention step aggregates high‑level information. Learnable parameters E_q and E_v model global spatial dependencies.

Temporal Dependency Modeling

Mamba‑2 provides linear‑complexity state‑space modeling of temporal patterns but lacks explicit spatial interaction handling. AMamba replaces Mamba‑2’s convolution with the WEA module. The input H is split into left and right branches via linear layers; the right branch processes the wavelet‑enhanced attention, while the left branch generates gating signals (using SiLU activation κ) that control information flow. LayerNorm and a final linear layer produce the AMamba output.

Market‑Index Guided Prefix

The market index, reflecting macro trends, is concatenated as a prefix token to the stock representations in both spatial and temporal modules, providing global context. Adaptive aggregation compresses the T‑length index to a single token before feeding it to the selective SSM, reducing computation while preserving guidance.

Prediction and Loss

The final encoder layer output passes through a linear layer to produce ranking scores. Training minimizes a combined loss: Pearson correlation loss L_{pearson} aligns predicted scores with true returns, and mean‑squared error L_{MSE} penalizes value deviations.

Experiments

Setup

Datasets: CSI500, CSI800, CSI1000 (China) and NASDAQ, NYSE (US).

Baselines: RSR, MTGNN, WaveForM, FC‑STGNN, MICN, FilterNet, TimeMixer, Autoformer, Crossformer, iTransformer, MASTER, SAMBA.

Metrics: Information Coefficient (IC), PnL, Sharpe ratio.

Main Results

Across five datasets, STEAM achieves the best performance on 14 out of 15 metrics. Compared with the second‑best baseline, average PnL improves by 19.5% and Sharpe by 21.2%, indicating higher returns and lower risk.

Hyper‑parameter Sensitivity

On CSI500 and CSI1000, performance varies with compression factor ρ, decomposition scale M, hidden size D, Mamba‑2 state expansion factor, and dropout rate, each having dataset‑specific optimal values.

Ablation Studies

WEA components: Removing the relay mechanism or multi‑frequency combination degrades performance; omitting DWT causes the largest drop, confirming its importance.

Market‑index prefix: Excluding either temporal or spatial prefix reduces accuracy; removing both significantly harms prediction.

Mamba‑2 vs Mamba: Replacing Mamba‑2 with Mamba lowers IC and PnL, likely due to over‑parameterization.

AMamba architecture: The full AMamba design outperforms a decoupled spatio‑temporal learning structure, showing the benefit of integrated attention.

In‑Depth Analyses

Different wavelet bases (db1, db2, bior3.1) as learnable parameters all improve performance; learnable db2 yields the best results.

Adaptive index aggregation outperforms other aggregation methods, reducing SSM computation while preserving index influence.

Relay attention reduces computational complexity compared with Crossformer’s learnable router, maintaining a small parameter count and faster training.

Visualization of simulated trading curves shows that stocks selected by STEAM achieve the highest cumulative returns, and weight‑matrix visualizations of the relay attention reveal distinct influence patterns among original stock tokens.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep learningattention mechanismstock predictionfinancial time serieswavelet transformMamba-2
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.