Paper Review: Hermes – Multi‑Scale Hypergraph for Stock Forecasting with Lead‑Lag Modeling

The Hermes framework introduces a moving‑aggregation module and a multi‑scale fusion module within a hypergraph network to capture industry lead‑lag interactions and multi‑scale stock relationships, achieving superior accuracy and efficiency over existing SOTA methods on three real US stock datasets, as demonstrated by extensive experiments and ablations.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Paper Review: Hermes – Multi‑Scale Hypergraph for Stock Forecasting with Lead‑Lag Modeling

Background

Stock time‑series prediction is crucial for investors, regulators, and analysts. Existing approaches often treat each stock independently, ignoring rich industry relationships such as intra‑industry co‑movement and inter‑industry lead‑lag dynamics (e.g., technology breakthroughs influencing energy stocks). Traditional hypergraph methods capture intra‑industry links via hyperedges but suffer from two limitations: insufficient modeling of industry lead‑lag interactions and lack of multi‑scale information.

Problem Definition

The paper addresses two core challenges: (1) deep modeling of industry lead‑lag relationships within a hypergraph framework, and (2) effective fusion of multi‑scale information while preserving consistency across scales.

Method

Hermes consists of six components:

Multi‑scale feature extraction : One‑dimensional convolution layers down‑sample the raw time series X into S scales, followed by a causal MLP that ensures each time step depends only on past information. Features are projected to a hidden space of dimension d via a linear layer.

Adaptive spatio‑temporal hypergraph construction : Using a stock‑industry matrix H, a hypergraph is built for each scale i. Hyperedges E^i are computed from stock features and industry relations with learnable contribution scores, yielding continuous weighted industry hyperedges.

Moving aggregation module : A fixed‑size sliding window (size k^i, stride 1) is applied to each hyperedge. The first k^i‑1 steps form the “lead” part, the last step forms the “lag” part. Message passing produces a correlation matrix, which is aggregated and combined with an MLP and residual connection to update hyperedge representations V^{e^i}.

Multi‑scale fusion module : Hyperedges from all scales are up‑sampled to the original length T. A learnable Mahalanobis distance computes similarity between hyperedges, generating an adjacency matrix Q. Normalized message passing aggregates cross‑scale information and feeds the result back to node embeddings.

Predictor : An MLP with residual connections predicts returns for each scale.

Optimization objective : The loss combines mean‑squared error (MSE) with a ranking loss, weighted by α.

Experiments

Datasets : Three real US‑stock datasets were used – NASDAQ (2013‑01‑02 to 2017‑12‑08, 1026 stocks, 113 industries), NYSE (same period, 1737 stocks, 130 industries), and S&P500 (2016‑01‑04 to 2022‑05‑25, 474 stocks, 11 industries). Each stock includes five technical indicators.

Baselines : Four families of models were compared – RNN (LSTM, ALSTM), GNN (RGCN, GAT, RSR‑I), HGNN (STHAN‑SR, ESTIMATE), and MLP (Linear, StockMixer).

Overall performance : Hermes outperformed all baselines on every dataset. For NASDAQ, it achieved IC = 0.044 (second‑best StockMixer = 0.043) and SR = 2.161 (second‑best StockMixer = 1.465). Similar gains were observed on NYSE (IC = 0.032, SR = 1.655) and S&P500 (IC = 0.050, SR = 2.247).

Ablation study : Removing the multi‑scale fusion module reduced IC (e.g., NASDAQ from 0.044 to 0.038), confirming its importance. Eliminating the entire multi‑scale pipeline (decomposition + fusion) caused a larger drop (NASDAQ IC = 0.030). Excluding the lead‑lag module lowered IC to 0.039, and removing residual connections caused a slight decrease to 0.041.

Efficiency analysis : Hermes required less training time, inference time, and memory than other HGNNs. On NASDAQ, training time was 215 ms (STHAN‑SR = 257 ms, ESTIMATE = 1267 ms), inference time 42 ms (STHAN‑SR = 52 ms, ESTIMATE = 163 ms), and memory 0.98 GB (STHAN‑SR = 2.15 GB, ESTIMATE = 9.62 GB).

Hyper‑parameter sensitivity : The look‑back window length T performed best around 16; shorter windows lacked information, longer windows added redundancy. Hidden dimension d achieved optimal performance with small values (e.g., 8), while larger dimensions led to over‑fitting. Multi‑scale lead‑lag step sizes needed dataset‑specific tuning (NASDAQ optimal [9, 6, 3]). The ranking‑loss weight α = 1 yielded the best and most stable results across all datasets.

Conclusion

Hermes effectively integrates moving aggregation and multi‑scale fusion within a hypergraph network to capture both industry lead‑lag dynamics and multi‑scale stock interactions, delivering superior predictive accuracy and computational efficiency on real‑world financial data.

financial time serieshypergraph neural networklead‑lag interactionmulti‑scale modelingstock forecasting
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.