Paper Review: DeltaLag – An End‑to‑End Deep Learning Framework for Dynamically Learning Lead‑Lag Patterns in Financial Markets
DeltaLag introduces a sparse cross‑attention mechanism that dynamically discovers pair‑specific, time‑varying lead‑lag relationships in US equity markets and uses them to construct interpretable trading signals, achieving significantly higher annualized returns, Sharpe ratios, and information coefficients than fixed‑lag, statistical, and other spatio‑temporal deep learning baselines.
Background
Lead‑lag effects—where price movements of one asset systematically precede another (e.g., large‑cap stocks leading small‑cap stocks)—are a strong predictive signal in quantitative finance. Traditional detection relies on linear statistics such as cross‑correlation and assumes a stationary pattern, which fails when market dynamics change.
Problem Definition
The task is to build an end‑to‑end deep learning model that (1) dynamically discovers pair‑specific, time‑varying lead‑lag relations and (2) uses these relations to generate high‑return, interpretable trading signals.
Method
Lead‑Lag Detection Model
Time Embedding – For each stock u, a rolling window of length L containing raw features (open, high, volume, etc.) is fed into a time encoder f_θ (e.g., LSTM) to produce hidden representations H_{u,t} of dimension N.
Query & Key Matrices – The query vector for target stock u is obtained from its last timestep embedding and transformed by a learnable matrix W^Q. For each candidate leading stock v, the last l_{max} timestep embeddings are transformed by W^K to form the key matrix.
Attention Scores & Top‑k Selection – Attention scores s_{u,v,τ} are computed for every candidate stock v and lag τ. All scores are stacked into a matrix; the top‑k highest scores and their positions (i_m, j_m) are selected, yielding the leading stocks v_{i_m} and corresponding lag values.
Signal Construction & Prediction
Feature Extraction – For each selected leading stock v_m, raw features at time t‑τ_{v_m→u} are extracted.
Attention‑Weighted Aggregation – Extracted features are weighted by their attention scores s_{v_m,u} and summed.
MLP Prediction – The aggregated vector is fed into a multilayer perceptron (MLP) to predict the next‑day return of the lagging stock u.
Loss Function Design
The model optimizes a ranking loss that emphasizes relative return ordering, combined with a tanh‑smoothed logistic regression term:
Here \hat{r}_i and r_i denote predicted and actual returns, and N is the number of stocks. This loss encourages monotonic return relationships and stabilizes optimization.
Experiments
Data & Settings – US equity data from 2010‑2023 (train 2010‑2018, validation 2019‑2021, test 2022‑2023) covering S&P 500, NASDAQ, NYSE; stocks with market cap <$2B are excluded. Hyper‑parameters: k=2 (top‑k leading stocks) and l_{max}=9 (max lag).
Baselines – Fixed‑lag model (Lag1Net), self lead‑lag models (SelfLagNet / SelfLag1), statistical pre‑computed graph (LagAll CorrGraph), and spatio‑temporal deep models (LSTM, Crossformer, etc.).
Evaluation Metrics – Information Coefficient (IC), Annualized Return (AR) of a long‑short portfolio (top 10 % long, bottom 10 % short), and Sharpe Ratio (SR).
Main Results
Dynamic lag advantage: DeltaLag’s dynamic lag yields AR = 0.2472 (S&P 500), 0.3333 (NASDAQ), 0.2301 (NYSE), markedly higher than Lag1Net (0.1662 / 0.2587 / 0.1458), confirming the importance of pair‑specific lag values.
Cross‑asset lead‑lag advantage: Using only cross‑asset pairs, DeltaLag outperforms self‑lead‑lag baselines (e.g., NASDAQ AR = 0.3333 vs. 0.2552 / 0.2475), indicating stronger predictive power from inter‑asset dependencies.
Adaptive mechanism vs. statistical pre‑computation: Across all test sets, DeltaLag’s AR and SR surpass LagAll CorrGraph (e.g., AR = 0.1192 / 0.2912 / 0.0258), validating that the weak‑momentum nature of lead‑lag relations requires adaptive learning.
Outperforming spatio‑temporal models: Cumulative returns grow steadily (≈10 bps per day) and exceed LSTM, Crossformer, etc., while realistic transaction costs (2‑5 bps) remain covered.
Ablation Studies
Feature Selection – Raw price‑volume features achieve higher AR (0.2472 / 0.3333 / 0.2301) than time‑embedding features (0.1889 / 0.3211 / 0.2298), showing better predictability and interpretability.
Feature Robustness – Using only single‑day returns, DeltaLag still attains high AR (e.g., NASDAQ = 0.3918) versus Lag1Net’s 0.0015.
Loss Function Comparison – The proposed ranking loss yields superior AR (0.2472 / 0.3333 / 0.2301) compared with IC loss (0.2205 / 0.2496 / 0.1192) and MSE loss (negative returns), confirming the effectiveness of ranking‑oriented objectives.
Lead‑Lag Pattern Analysis
Lag Distribution – Lags from 1 to 9 days appear uniformly (~10‑12 % each), indicating the model’s ability to adapt to varying market states.
Leading‑Stock Clustering – On average 38.18 unique leading stocks per day in S&P 500 (Top‑1 pair only 22.14), matching statistical clustering results and demonstrating automatic capture of stock groups.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
