Artificial Intelligence 11 min read

ICLR2026 Quantitative Finance Paper Summaries

This article compiles and summarizes recent ICLR2026 papers on quantitative finance, presenting their titles, authors, abstracts, code and paper links, and highlighting benchmarks such as AlphaBench, TiMi, STABLE, and AlphaSAGE that explore large language models and multi‑agent systems for factor mining and trading.

Bighead's Algorithm Notes

Mar 17, 2026

ICLR2026 Quantitative Finance Paper Summaries

AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining

Paper link: https://openreview.net/pdf?id=d97Q8r7ZKZ

Code link: https://alphabench.cc

AlphaBench is the first systematic benchmark for evaluating large language models (LLMs) on formulaic alpha factor mining (FAFM), a core problem in quantitative investing that seeks interpretable formulas to extract predictive signals from historical financial series. The benchmark defines three core tasks that reflect typical quant‑research workflows:

Factor generation – prompting an LLM to produce candidate factor formulas.

Factor evaluation – assessing the predictive performance of generated formulas on out‑of‑sample data.

Factor search – iteratively refining prompts or model configurations to improve factor quality.

Beyond task‑level scores, AlphaBench studies how three LLM configuration dimensions affect results: model type (open‑source vs. closed‑source), prompting paradigm (zero‑shot, few‑shot, chain‑of‑thought), and inference strategy (temperature, top‑k, beam search). Experiments on a spectrum of models (e.g., Llama‑2‑70B, GPT‑4, Claude‑2) show that LLMs can automate factor generation and evaluation with performance comparable to handcrafted baselines, but they still suffer from limited robustness across market regimes, high computational cost for exhaustive factor search, and practical usability gaps such as inconsistent formula syntax.

TiMi: Trade in Minutes – A Rationality‑Driven Multi‑Agent System for Quantitative Financial Trading

Paper link: https://arxiv.org/pdf/2510.04787

TiMi proposes a reasoning‑driven multi‑agent architecture that separates strategy development from minute‑level execution. The system consists of three stages:

Semantic analysis: an LLM parses market news, macro indicators, and historical price data to produce a high‑level trading hypothesis.

Code generation: the same LLM synthesizes Python trading scripts (e.g., back‑testing, risk‑adjusted position sizing) using chain‑of‑thought prompting.

Mathematical reasoning: a second LLM performs portfolio‑level optimization (e.g., mean‑variance, risk‑parity) and validates the generated code against quantitative constraints.

TiMi’s agents communicate through a shared knowledge base, allowing the strategy‑design agents to operate offline while the execution agents run on a minute‑level loop without continuous inference. Empirical evaluation on more than 200 stock and cryptocurrency trading pairs demonstrates stable profitability (average annualized return >10% across assets), high execution efficiency (latency <200 ms per decision), and effective risk control (maximum drawdown reduced by ~30% compared to baseline rule‑based agents) under volatile market dynamics.

STABLE: Shift‑Tolerant Allocation via Black–Litterman Using Conditional Diffusion Estimates

Paper link: https://openreview.net/pdf?id=VltZQpfarw

Code link: https://github.com/iclr26stable

STABLE integrates conditional diffusion generative models with an estimation‑based portfolio allocation module to address the challenge of shifting market styles. The method operates as follows:

Input a macro‑economic context vector (e.g., GDP growth, inflation) and asset‑specific signals (e.g., momentum, valuation).

Condition a diffusion model to generate single‑stock return trajectories that reflect the current macro regime.

From the generated trajectories compute style‑aware predictive return distributions and a covariance matrix.

Feed these estimates into a Black‑Litterman framework to obtain risk‑diversified portfolio weights.

Empirical results on major equity markets show that STABLE improves the Sharpe ratio by up to 122.9% relative to traditional mean‑variance baselines, reduces maximum drawdown, and achieves state‑of‑the‑art time‑series estimation performance (mean‑squared error 15.7% lower than competing generative baselines).

AlphaSAGE: Structure‑Aware Alpha Mining via GFlowNets for Robust Exploration

Paper link: https://arxiv.org/pdf/2509.25055

Code link: https://github.com/BerkinChen/AlphaSAGE

AlphaSAGE tackles three limitations of existing reinforcement‑learning (RL) approaches to automated alpha factor mining: sparse rewards, insufficient structural representation of factor formulas, and low diversity of generated alphas. The framework introduces:

A relational graph convolutional network (RGCN) encoder that captures the syntactic and semantic structure of candidate formulas.

A generative flow network (GFlowNet) that samples factor formulas proportionally to a learned reward, encouraging diverse exploration.

A dense multidimensional reward that jointly evaluates predictive performance, interpretability, and statistical significance.

Experiments on benchmark financial datasets reveal that AlphaSAGE discovers factor portfolios that are more diverse (entropy increase of 0.42 nats), more novel (average pairwise Jaccard distance 0.31 vs. 0.12 for baselines), and achieve higher out‑of‑sample predictive power (average information coefficient improvement of 18%).

large language models benchmark multi-agent systems quantitative finance AlphaBench factor mining TiMi

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.