WSDM2026 Quantitative Research Papers: Summaries and Insights
This article presents concise summaries of three recent AI‑driven finance papers—Diffolio’s diffusion‑based risk‑aware portfolio optimization, STORM’s dual‑vector‑quantized VAE factor model, and AutoHypo‑Fin’s autonomous web‑mined hypothesis generation—highlighting their motivations, methods, and experimental gains.
Diffusion Models for Risk-Aware Portfolio Optimization
Link: https://doi.org/10.1145/3773966.3777955
Authors: Jihyeong Jeon, Jeongyoung Lee, U Kang
Abstract: This paper focuses on generating diverse, high‑quality portfolios that respect a user’s risk preference and historical data. Deterministic deep‑learning methods lack flexibility for varying risk preferences, while stochastic approaches rely on multi‑stage pipelines, are complex to train, and misalign with the ultimate goal of optimal portfolio generation. The authors propose Diffolio, a diffusion‑model‑based framework that directly learns a pseudo‑optimal portfolio distribution, addressing both the inflexibility of deterministic models and the complexity/misalignment of stochastic ones. Experiments on multiple real‑market datasets show Diffolio significantly outperforms existing baselines in return, risk control, and overall reliability, achieving up to a 12.1 percentage‑point increase in annualized return.
STORM: A Spatio‑Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
Link: https://doi.org/10.1145/3773966.3777972
Code: https://github.com/DVampire/Storm
Authors: Zhao Yilei, Zhang Wentao, Yang Tingran, Jiang Yong, Huang Fei, Lim Wei Yang Bryan
Abstract: Factor models are widely used in financial trading for asset pricing and capturing excess returns from mispricing. Existing VAE‑based latent factor models struggle to capture individual stock temporal patterns, suffer from single‑dimensional factor representations, and lack diversity, leading to low‑quality factors and poor robustness. The authors introduce STORM, a dual‑vector‑quantized VAE that extracts stock features from both spatial and temporal perspectives, fuses and aligns them at fine‑grained and semantic levels, and represents factors as multi‑dimensional embeddings. A discrete codebook clusters similar embeddings, ensuring orthogonal and diverse factors for better selection. Down‑stream experiments on two stock datasets for portfolio management and six individual‑stock trading tasks demonstrate STORM’s flexibility and superior performance over baseline models.
Web‑Mined Hypothesis Generation for Financial Markets: An Autonomous Backtesting Framework
Link: https://doi.org/10.1145/3773966.3785511
Authors: Jing Li, Jinliang Li
Abstract: The influence of large‑scale unstructured web data (e.g., regulatory documents, social media) on financial markets is growing, while traditional hypothesis generation relies on expert knowledge, leading to slow, subjective, and inefficient processes. The authors propose AutoHypo‑Fin, an autonomous framework that mines web data to generate and back‑test financial hypotheses. The system integrates information extraction, knowledge graphs, retrieval‑augmented generation, and optimized backtesting, enabling end‑to‑end hypothesis creation, testing, and refinement. Experiments from 2019 to 2024 show AutoHypo‑Fin outperforms traditional strategies in risk‑adjusted return, hit rate, and drawdown control, and ablation studies confirm the importance of each component.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
