Paper Review: AlphaGAT’s Two‑Stage Learning for Adaptive Portfolio Selection

AlphaGAT introduces a two‑stage learning framework that first extracts robust alpha factors with a CATimeMixer model and a novel loss, then dynamically weights these factors via reinforcement learning (PPO) and a graph attention network, achieving superior portfolio performance across DJIA, HSI, CSI‑100 and crypto markets despite noisy data and distribution shifts.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Paper Review: AlphaGAT’s Two‑Stage Learning for Adaptive Portfolio Selection

Background

Portfolio selection is a core task in finance, requiring allocation of assets to maximize returns while managing risk. Traditional methods rely on fixed assumptions about price patterns and struggle with the low signal‑to‑noise ratio and distribution shifts inherent in market data. Recent machine‑learning approaches fall into supervised learning (SL) and reinforcement learning (RL), but both face challenges in noisy, non‑stationary environments.

Problem Definition

The paper targets two main issues: (1) the low signal‑to‑noise ratio of raw market data, and (2) the changing distribution of market dynamics that degrade model performance over time.

Method

3.1 AlphaGAT Overview

AlphaGAT consists of two key components: alpha‑factor mining and portfolio optimization.

3.2 Alpha‑Factor Mining

(1) Time‑feature extraction : Inspired by the TimeMixer model, raw data X_t is down‑sampled via average pooling into L scales { X_0^t, …, X_L^t}. Each scale undergoes trend and seasonal decomposition using top‑down and bottom‑up Conv1D, producing trend features T_i^t and seasonal features S_i^t, which are combined into H_i^t.

(2) Correlation‑feature extraction : A multi‑head attention (MHA) module computes inter‑asset correlations for each H_i^t, yielding E_i^t. The CATimeMixer architecture stacks three identical layers; each layer feeds the MHA output E_i^t as the next layer’s input X_i^t, iteratively intertwining temporal and relational features.

(3) Alpha‑factor generation : A multilayer perceptron (MLP) transforms the multi‑scale features into alpha factors, which are then aggregated across scales.

(4) Loss function : The loss combines (a) minimization of the negative average information coefficient (IC) between alpha factors Z_i^t and true price moves y_t, and (b) a regularization term L_{cov} that penalizes off‑diagonal entries of the alpha‑factor covariance matrix C, encouraging diverse, low‑correlation factors. The total loss is Loss = -IC + λ·L_{cov}, where λ balances prediction accuracy and factor diversity.

3.3 Portfolio Optimization

3.3.1 MDP modeling : The selection problem is cast as a Markov decision process. State s_t is the set of mined alpha factors Z_t; action a_t assigns weights to combine these factors into a trading signal. Reward r_t reflects portfolio performance (e.g., cumulative wealth).

3.3.2 Graph Attention Network (GAT) : Each alpha factor is a node in a fully‑connected graph. GAT computes attention coefficients between factors, normalizes them with softmax, and aggregates neighbor features to obtain updated factor representations, allowing the policy to consider inter‑factor relationships.

3.3.3 Proximal Policy Optimization (PPO) : PPO is used to optimize the policy π(s_t), dynamically adjusting factor weights. PPO’s sample efficiency and ability to handle complex action spaces make it suitable for volatile financial markets.

Experiments

4.1 Experimental Setup

Datasets : Four markets – US DJIA, Hong Kong HSI, China CSI‑100, and crypto – covering diverse asset classes. Eight raw features per asset (open, close, high, low, volume, VWAP, turnover, price change) are split 8:1:1 into training, validation, and test sets.

Baselines : Traditional strategies (BAH, DynamicCRP, OLMAR), SL‑based methods (ALSTM, AdaRNN), and RL‑based methods (RAT, PPN, FinRL‑Meta).

Metrics : Cumulative wealth (CW), annualized percentage yield (APY), annualized Sharpe ratio (ASR), and Calmar ratio (CR).

4.2 Results

Across all four markets, AlphaGAT consistently outperforms every baseline on all metrics. Traditional strategies often break even or lose money, especially on DJIA and volatile HSI/CSI markets, while AlphaGAT achieves significant profitability, including the best results in the crypto market.

Alpha‑factor effectiveness : IC and RankIC evaluate the first‑stage factors. Paired t‑tests show AlphaGAT’s factors surpass variants such as TimeMixer, CTimeMixer (with Conv1D), and ATimeMixer (with cross‑asset attention).

Ablation study : Four simplified variants (MLP, random weights, top‑IC weighting, equal weighting) demonstrate that (a) adding the RL component markedly improves cumulative wealth, and (b) GAT’s handling of diverse factor relationships yields superior decision‑making compared with plain MLP.

Case study : On the DJIA dataset, the RL agent preferentially assigns higher weights to the top‑10 alpha factors with the largest IC values, confirming that the model balances predictive accuracy and adaptability.

Conclusion

AlphaGAT’s two‑stage architecture—robust alpha‑factor extraction via CATimeMixer and adaptive portfolio weighting through RL‑driven GAT—effectively addresses noisy, non‑stationary financial data, delivering state‑of‑the‑art performance across multiple markets.

reinforcement learningtime seriesfinancial AIAlphaGATgraph attention networkportfolio selection
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.