MaGNet: Dual‑Hypergraph Mamba Network for Time‑Causal and Global Stock Trend Forecasting
MaGNet introduces a three‑component architecture—MAGE block with bidirectional Mamba, adaptive gating and sparse MoE, 2‑D spatio‑temporal attention, and a dual hypergraph framework (time‑causal and global probability hypergraphs)—that outperforms 17 baselines on six major stock indices in both prediction accuracy and risk‑adjusted returns.
Background
Stock market prediction is crucial for profitable trading and portfolio management, but high volatility, non‑stationarity, and complex cross‑stock interactions make it challenging. Traditional time‑series models (SVM, ARIMA) and deep models (CNN, RNN, Transformer) struggle with long‑range dependencies and computational cost. Recent state‑space models such as Mamba achieve near‑linear complexity but lack contextual fusion, market‑state adaptation, and global dependency modeling. Graph‑based methods capture pairwise relations but ignore higher‑order group dynamics; hypergraph neural networks (HGNN) support higher‑order groups but suffer from unified node processing and ambiguous relation types.
Problem Definition
The core challenges are:
Insufficient time‑series modeling: existing models treat each stock independently, ignore cross‑stock dynamics, and lack adaptive market‑state handling.
Limited relation modeling: hypergraph approaches cannot distinguish causal vs. instantaneous, local vs. global relations, leading to loss of multi‑scale market interactions.
The goal is to design an architecture that jointly captures fine‑grained temporal causality and global market patterns.
Method
3.1 MAGE Block
The MAGE block integrates four sub‑modules:
Bidirectional Mamba: standard Mamba processes sequences unidirectionally; MaGNet runs forward and backward passes on the stock embedding sequence Zₙ, then fuses them.
Adaptive Gating Mechanism: a gating network learns weights W_f and W_b to control the contribution of forward and backward representations.
Sparse Mixture‑of‑Experts (MoE): a top‑1 routing MoE layer adapts to market regimes (e.g., bull vs. bear) by assigning each token to the most suitable expert after capacity‑normalized scoring.
Multi‑head Self‑Attention: captures global dependencies across the sequence.
These components together provide context‑aware, regime‑adaptive, and globally aware temporal modeling.
Bidirectional Mamba processes the forward direction:
and the backward direction:
Adaptive gating merges the two streams with learned weights W_f and W_b.
Sparse MoE selects the top‑1 expert for each token, enabling dynamic resource allocation for different market states.
Multi‑head attention captures global dependencies.
3.2 Feature‑level and Stock‑level 2‑D Spatio‑Temporal Attention
Unlike methods that flatten multivariate data, MaGNet preserves a three‑dimensional tensor (features × stocks × time). Feature‑level 2‑D attention treats each feature d as an N×T matrix and applies multi‑channel attention to learn inter‑feature dependencies. Stock‑level 2‑D attention applies the same mechanism along the stock dimension, enabling precise cross‑stock dependency fusion.
3.3 Dual Hypergraph Framework
Two complementary hypergraphs are constructed:
Time‑Causal Hypergraph (TCH): uses causal multi‑head attention (CausalMHA) with an upper‑triangular mask to generate attention matrix A, then applies top‑K sparsification to retain strong causal paths.
Global Probability Hypergraph (GPH): flattens the stock‑level 2‑D attention output Z^{N2D}, passes it through a ReLU‑tanh activated feed‑forward network, column‑normalizes to obtain a soft hyperedge association matrix, and weights hyperedges by Jensen‑Shannon divergence to reflect uniqueness.
Hyperedge construction uses a ReLU‑tanh activated FFN to generate the association matrix, balancing sparsity and stability.
Hypergraph convolution aggregates high‑order information and propagates it globally.
Experiments
4.1 Experimental Setup
Datasets: six major indices (DJIA, HSI, NASDAQ‑100, S&P‑100, CSI‑300, Nikkei‑225) from 2020‑2024, split 7:1:2 for train/validation/test. Features include Yahoo Finance basic data (close, high, etc.) and Qlib‑derived Alpha158/Alpha360 technical indicators, all Z‑Score normalized.
Baselines: 17 models covering stock‑specific predictors (SFM, Adv‑ALSTM), generic time‑series models (GRU, LSTM, DLinear), and graph models (GCN, GAT).
Metrics: prediction (ACC, PRE, REC, F1, AUC) and back‑testing (annualized return AR, Sharpe ratio SR, Calmar ratio CR, maximum drawdown MDD).
4.2 Results
Prediction performance: MaGNet surpasses all baselines on every index. Highest accuracies are 54.9 % (CSI‑300), 54.19 % (HSI), 54.02 % (Nikkei‑225). Recall peaks at 97.00 % (S&P‑100) and 96.14 % (NASDAQ‑100). AUC advantage observed, e.g., NASDAQ‑100 improves from 51.41 % (second best) to 52.24 %.
Back‑testing performance: MaGNet achieves superior risk‑adjusted returns, e.g., AR = 22.6 % (CSI‑300), 19.92 % (DJIA), 19.58 % (Nikkei‑225); Sharpe ratios exceed 1.0 across markets (NASDAQ‑100 = 1.05, S&P‑100 = 1.40); maximum drawdown is limited to 5.19 % (CSI‑300) and 5.07 % (DJIA), markedly lower than baselines.
4.3 Ablation Study
Removing the MAGE block drops ACC (e.g., NASDAQ‑100 from 53.72 % to 52.97 %) and AR (17.09 % → 9.68 %), confirming its central role in temporal modeling.
Removing 2‑D spatio‑temporal attention reduces AR (S&P‑100 from 17.14 % to 13.61 %), highlighting its importance for feature and stock dependency fusion.
Removing TCH or GPH separately harms local causal performance (HSI AR 12.25 % → 8.59 %) or global pattern capture (NASDAQ‑100 AR 17.09 % → 8.44 %), validating the complementary nature of the dual hypergraph.
Overall, MaGNet demonstrates that integrating bidirectional state‑space modeling, adaptive expert routing, and dual hypergraph relational learning yields consistent gains in both predictive accuracy and financial risk management.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
