LLMs Write and Evolve Code to Redefine Quantitative Factor Mining – The CogAlpha ACL Paper

The CogAlpha framework upgrades Alpha discovery from static formulas to executable Python code, organizes a 7‑layer, 21‑agent research hierarchy, iteratively evolves factor candidates, and on CSI300 10‑day prediction outperforms 21 baselines with a 16.39% annual excess return and an IR of 1.8999, demonstrating that large models can actively participate in the discovery process.

Machine Heart
Machine Heart
Machine Heart
LLMs Write and Evolve Code to Redefine Quantitative Factor Mining – The CogAlpha ACL Paper

Problem

Alpha discovery in quantitative investing is difficult because market noise is high, data dimensionality is large, and truly useful signals are scarce. Manual factor engineering is slow, genetic programming often gets trapped in local optima, and deep‑learning models, while powerful, lack clear explanations and can become unstable across markets.

Key Innovation

CogAlpha replaces formula‑based factors with full Python code, dramatically expanding the search space. Large language models (LLMs) are used to generate, annotate, and execute candidate factor programs.

Hierarchical Research Architecture

Layer 1 – Market structure and cycle analysis (e.g., long‑term trends).

Layer 2 – Extreme risk and fragility detection (tail‑risk, collapse signals).

Layer 3 – Price‑volume relationship and liquidity assessment.

Layer 4 – Trend continuation, short‑term reversals, and volatility clustering.

Layer 5 – Multi‑scale complexity such as drawdown structures and fractal roughness.

Layer 6 – Stability and state gating to activate signals only under suitable market conditions.

Layer 7 – Geometric feature extraction and fusion, including K‑line patterns, multi‑factor synthesis, and nonlinear transformations.

Evolutionary Workflow

The system iterates like a research team: generate a batch of candidate Alphas, verify that the Python code runs and the logic is sound, then evaluate each candidate with five metrics—IC, Rank‑IC, ICIR, Rank‑ICIR, and mutual information (MI). Candidates above the 65th percentile are accepted; those above the 80th percentile are deemed elite and enter the next evolution round. To prevent convergence to a narrow set of patterns, three diversification strategies are applied: mild rewrites for stability, moderate rewrites that inject natural variations, and creative rewrites that encourage the model to reinterpret the research direction.

Experimental Results

Experiments on five datasets from China, the US, and Hong Kong show that CogAlpha consistently outperforms 21 baseline methods. On the CSI300 10‑day prediction task, it achieves an annualized excess return of 16.39 % and an information ratio of 1.8999. Closed‑source LLMs do not dominate; some inference‑oriented models perform worse, indicating that the workflow’s structure, rather than raw model size, drives performance.

Interpretability

Each generated Alpha includes detailed comments and executable code. For example, one factor computes “price upward amplitude divided by volume” to capture liquidity impact—if price jumps sharply with low volume, the factor signals a thin market and potential short‑term profit.

Limitations

Back‑testing is performed within the Qlib framework and may differ from live trading. LLM outputs are stochastic, and larger data scales increase computation time. Consequently, CogAlpha is positioned as a powerful research engine rather than a plug‑and‑play trading system.

Broader Impact

The agentic research paradigm could be applied to other high‑noise, low‑signal domains such as material discovery, strategy generation, experimental design, and complex industrial optimization.

Paper Details

Title: Cognitive Alpha Mining via LLM‑Driven Code‑Based Evolution

Authors: Fengyuan Liu, Yi Huang, Sichun Luo, Yuqi Wang, Yazheng Yang, Xinye Li, Zefa Hu, Junlan Feng, Qi Liu, Grace Investment Machine

Link: https://arxiv.org/abs/2511.18850

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

code generationLLMInterpretabilityQuantitative FinanceEvolutionary AlgorithmsAlpha MiningACL 2026
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.