Artificial Intelligence 15 min read

Which Loss Function Ranks Stocks Best? An Empirical Study with Transformer Models

This paper evaluates point‑wise, pair‑wise, and list‑wise loss functions for Transformer‑based stock‑return prediction on 110 S&P 500 stocks, showing that Margin loss achieves the highest annual return (16.23%) and Sharpe ratio (0.75), ListNet delivers strong returns with low volatility, and BPR minimizes maximum drawdown, highlighting how loss design critically shapes ranking‑driven portfolio performance.

Bighead's Algorithm Notes

Feb 18, 2026

Which Loss Function Ranks Stocks Best? An Empirical Study with Transformer Models

Background

Quantitative trading relies on accurate stock ranking to allocate capital, and while traditional statistical models like ARIMA have long been used, Transformer architectures excel at capturing long‑range dependencies in financial time series. However, the impact of different training loss functions on a Transformer's ability to produce profitable rankings remains unclear.

Problem Definition

The study aims to assess how various loss functions affect a Transformer's learning of stock return patterns and its downstream portfolio decisions. Daily returns of 110 S&P 500 stocks (selected from the top‑10 market‑cap stocks in each of the 11 GICS sectors) are predicted, then ranked to construct equal‑weight long‑only portfolios of the top k (k=5) stocks.

Method

3.1 Model Architecture

The PortfolioMASTER model combines alternating temporal self‑attention (processing each stock’s history independently) and spatial self‑attention (modeling inter‑stock relationships at each timestep). Input features (daily return and turnover) over a 20‑day look‑back window are projected to dimension D, enriched with positional encodings, and processed by a stack of encoder layers. The final attention‑based aggregation yields per‑stock representations used to predict next‑day returns.

3.2 Loss Functions

The paper evaluates loss functions grouped into point‑wise, point‑wise + pair‑wise, and list‑wise categories:

Point‑wise: Mean Squared Error (MSE) as baseline.

Point‑wise + pair‑wise: MSE combined with a pairwise component L_{PairwiseComponent} weighted by λ, including Hinge loss, Margin loss (with margin m), Bayesian Personalized Ranking (BPR), RankNet (with scaling α), and weighted Hinge variants (WHR1/WHR2).

List‑wise: ListNet loss, which converts true scores and predictions into probability distributions using a temperature parameter τ.

Dataset and Features

Data span from 2015‑01‑03 to 2024‑12‑03, covering daily returns and turnover for the selected 110 stocks. Features are normalized per stock using the training set scaler.

Training and Evaluation

Data are split chronologically: 70% training, 15% validation, 15% test. Models are trained for up to 50 epochs with AdamW optimizer, weight decay, early stopping on validation loss, and learning‑rate scheduling. Hyper‑parameters (including dropout, model dimensions, learning rate, and loss‑specific parameters λ, m, α, τ) are tuned via grid search for each loss.

Portfolio Simulation and Metrics

Daily rebalancing constructs equal‑weight long‑only portfolios of the top 5 ranked stocks. Performance is measured by Cumulative Return (CR), Annualized Return (AR), Annualized Volatility (AV), Sharpe Ratio (SR, risk‑free rate 4.3%), and Maximum Drawdown (MDD). Prediction quality is assessed by Information Coefficient (IC), ICIR, and Precision@5 (P@5), with test‑set MSE also reported.

Experimental Results

4.4.1 Portfolio Performance Analysis

Margin loss achieves the highest AR (16.23%) and SR (0.7529). ListNet follows closely with AR = 16.00% and SR = 0.7407, while also yielding the lowest AV (15.79%). BPR produces the smallest MDD (‑15.77%), indicating better risk control despite a slightly lower SR (0.7200). The MSE baseline is outperformed by all ranking‑oriented losses in risk‑adjusted returns.

4.4.2 Prediction Quality vs. Portfolio Results

IC values are similar across losses (0.073–0.077) and P@5 remains around 0.358–0.359. RankNet attains the highest IC (0.0767) but its portfolio AR and SR are only moderate. Conversely, Margin and ListNet deliver superior portfolio metrics without markedly higher IC, suggesting that loss design influences how ranking errors are penalized and thus impacts downstream profitability.

4.4.3 Impact of Loss Design

Pairwise losses that explicitly model stock preferences (Margin, BPR) prove effective: Margin’s margin encourages confident separation of top stocks, while BPR’s focus on correctly ordering preferred items reduces drawdowns. ListNet’s list‑wise optimization captures global ranking patterns beneficial for portfolio construction, even though its test‑set MSE is higher because it does not directly optimize point‑wise return accuracy.

Conclusion

The choice of loss function substantially affects both ranking quality and portfolio performance when using Transformers for stock return prediction. Ranking‑oriented losses, especially Margin and ListNet, outperform plain MSE, and incorporating pairwise preferences can improve risk characteristics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Transformer Loss Functions portfolio optimization quantitative trading financial time series stock ranking

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.