Artificial Intelligence 9 min read

WSDM2026 Quantitative Research Papers: Summaries and Insights

This article presents concise summaries of three recent AI‑driven finance papers—Diffolio’s diffusion‑based risk‑aware portfolio optimization, STORM’s dual‑vector‑quantized VAE factor model, and AutoHypo‑Fin’s autonomous web‑mined hypothesis generation—highlighting their motivations, methods, and experimental gains.

Bighead's Algorithm Notes

Apr 9, 2026

WSDM2026 Quantitative Research Papers: Summaries and Insights

Diffusion Models for Risk-Aware Portfolio Optimization

Link: https://doi.org/10.1145/3773966.3777955

Authors: Jihyeong Jeon, Jeongyoung Lee, U Kang

Abstract: This paper focuses on generating diverse, high‑quality portfolios that respect a user’s risk preference and historical data. Deterministic deep‑learning methods lack flexibility for varying risk preferences, while stochastic approaches rely on multi‑stage pipelines, are complex to train, and misalign with the ultimate goal of optimal portfolio generation. The authors propose Diffolio, a diffusion‑model‑based framework that directly learns a pseudo‑optimal portfolio distribution, addressing both the inflexibility of deterministic models and the complexity/misalignment of stochastic ones. Experiments on multiple real‑market datasets show Diffolio significantly outperforms existing baselines in return, risk control, and overall reliability, achieving up to a 12.1 percentage‑point increase in annualized return.

STORM: A Spatio‑Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading

Link: https://doi.org/10.1145/3773966.3777972

Code: https://github.com/DVampire/Storm

Authors: Zhao Yilei, Zhang Wentao, Yang Tingran, Jiang Yong, Huang Fei, Lim Wei Yang Bryan

Abstract: Factor models are widely used in financial trading for asset pricing and capturing excess returns from mispricing. Existing VAE‑based latent factor models struggle to capture individual stock temporal patterns, suffer from single‑dimensional factor representations, and lack diversity, leading to low‑quality factors and poor robustness. The authors introduce STORM, a dual‑vector‑quantized VAE that extracts stock features from both spatial and temporal perspectives, fuses and aligns them at fine‑grained and semantic levels, and represents factors as multi‑dimensional embeddings. A discrete codebook clusters similar embeddings, ensuring orthogonal and diverse factors for better selection. Down‑stream experiments on two stock datasets for portfolio management and six individual‑stock trading tasks demonstrate STORM’s flexibility and superior performance over baseline models.

Web‑Mined Hypothesis Generation for Financial Markets: An Autonomous Backtesting Framework

Link: https://doi.org/10.1145/3773966.3785511

Authors: Jing Li, Jinliang Li

Abstract: The influence of large‑scale unstructured web data (e.g., regulatory documents, social media) on financial markets is growing, while traditional hypothesis generation relies on expert knowledge, leading to slow, subjective, and inefficient processes. The authors propose AutoHypo‑Fin, an autonomous framework that mines web data to generate and back‑test financial hypotheses. The system integrates information extraction, knowledge graphs, retrieval‑augmented generation, and optimized backtesting, enabling end‑to‑end hypothesis creation, testing, and refinement. Experiments from 2019 to 2024 show AutoHypo‑Fin outperforms traditional strategies in risk‑adjusted return, hit rate, and drawdown control, and ablation studies confirm the importance of each component.