Artificial Intelligence 12 min read

Quantitative Finance Paper Summaries (Nov 29–Dec 5 2025)

This article presents concise summaries of five recent AI‑driven finance papers, covering a stress‑testing framework for LLM trading agents, an orchestration framework for financial agents, an event‑reflection memory model for stock forecasting, a hybrid LLM‑Bayesian network architecture for options wheel strategies, and their experimental results.

Bighead's Algorithm Notes

Dec 5, 2025

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Paper link: https://arxiv.org/pdf/2512.02261v1

Code link: https://github.com/Yanlewen/TradeTrap

Authors: Lewan Yan, Jilin Mei, Tianyi Zhou, Lige Huang, Jie Zhang, Dongrui Liu, Jing Shao

LLM‑based trading agents are increasingly deployed in real‑world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or failure conditions remain largely untested. The authors introduce TradeTrap, a unified evaluation framework that systematically stress‑tests adaptive and programmable autonomous trading agents. TradeTrap targets four core components of autonomous trading agents—market intelligence, strategy formulation, portfolio & accounting, and trade execution—and evaluates their robustness under controlled system‑level perturbations. All assessments are performed in a closed‑loop historical backtesting setting using real U.S. stock market data with identical initial conditions, ensuring fair and reproducible cross‑agent and cross‑attack comparisons. Extensive experiments reveal that minor disturbances to a single component can propagate through the decision loop, causing extreme concentration, loss of control, and large portfolio drawdowns, demonstrating that current autonomous trading agents can be systematically misled at the system level.

Orchestration Framework for Financial Agents: From Algorithmic Trading to Agentic Trading

Paper link: https://arxiv.org/pdf/2512.02227v1

Code link: https://github.com/Open-Finance-Lab/AgenticTrading

Authors: Jifeng Li, Arnav Grover, Abraham Alpuerto, Yupeng Cao, Xiao‑Yang Liu

The financial market serves as a crucial testbed for AI agents due to its temporal dynamics and low signal‑to‑noise ratio. Building an effective algorithmic trading system traditionally requires years of development and testing by a specialized team. The authors propose a collaborative framework for financial agents that aims to democratize financial intelligence for the public. They map each component of a traditional algorithmic trading system to an agent, including a planner, coordinator, Alpha agent, risk agent, portfolio agent, backtest agent, execution agent, audit agent, and memory agent. Two internal trading examples are presented. For a stock‑trading task (hourly data from April 2024 to December 2024), the method achieves a 20.42% return, a Sharpe ratio of 2.63, and a maximum drawdown of –3.59%, compared with a 15.97% return for the S&P 500 index. For a BTC‑trading task (minute data from July 27 2025 to August 13 2025), the method yields an 8.39% return, a Sharpe ratio of 0.38, and a maximum drawdown of –2.80%, while BTC price increased by 3.80%.

StockMem: An Event‑Reflection Memory Framework for Stock Forecasting

Paper link: https://arxiv.org/pdf/2512.02720v1

Authors: He Wang, Wenyilin Xiao, Songqiao Han, Hailiang Huang

Stock price prediction is challenged by market volatility and sensitivity to real‑time events. While large language models (LLMs) offer new avenues for text‑based prediction, their application in finance is limited by noisy news data and the lack of explicit answers in text. Generic memory architectures struggle to identify the key drivers of price movements. To address this, the authors propose StockMem, an event‑reflection dual‑layer memory framework. News is structured into events and mined along two dimensions: horizontal integration aggregates daily events, while vertical tracking captures event evolution, extracting incremental information that reflects differences in market expectations and building a temporal event knowledge base. By analyzing event‑price dynamics, the framework further constructs a reflective knowledge base that captures causal experience. For forecasting, StockMem retrieves similar historical scenarios and combines current events, incremental data, and past experience for reasoning. Experiments show that StockMem outperforms existing memory architectures and, by tracing the information chain influencing prices, provides more interpretable and superior reasoning for financial prediction.

A Hybrid Architecture for Options Wheel Strategy Decisions: LLM‑Generated Bayesian Networks for Transparent Trading

Paper link: https://arxiv.org/pdf/2512.01123v1

Authors: Xiaoting Kuang, Boken Lin

Large language models excel at understanding context and qualitative nuances but struggle with the strict, transparent reasoning required in high‑risk quantitative domains such as financial trading. The authors propose a model‑first hybrid architecture for the options “wheel” strategy, combining LLM strengths with the robustness of Bayesian networks. Rather than using the LLM as a black‑box decision maker, it serves as an intelligent model builder. For each trading decision, the LLM explains the current market conditions—including price, volatility, trend, and news—and constructs a context‑specific Bayesian network, hypothesizing relationships among key variables. The LLM selects relevant historical data from an 18.75‑year, 8,919‑trade dataset to populate the network’s conditional probability tables, focusing on scenarios similar to the current context. The instantiated Bayesian network then performs transparent probabilistic inference, producing explicit probability distributions and risk metrics to support the decision. A feedback loop enables the LLM to analyze trade outcomes and iteratively refine subsequent network structures and data selections, learning from successes and failures. Empirically, the hybrid system demonstrates effective performance on the wheel strategy: over nearly 19 years of out‑of‑sample testing it achieves a 15.3% annualized return with a Sharpe ratio of 1.08 (market benchmark 0.62), markedly reduced drawdown (‑8.2% vs. ‑60%), and maintains a 0% allocation through strategic option rolling. Crucially, each trading decision is fully explainable, involving an average of 27 recorded decision factors (e.g., volatility level, option premium, risk metric, market backdrop).

Key empirical results: 15.3% annualized return, Sharpe 1.08, drawdown reduced to ‑8.2% from ‑60%, zero allocation via strategic option rolling, average of 27 decision factors per trade.

LLM benchmarking risk analysis financial AI trading agents

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.