Artificial Intelligence 11 min read

AutoHypo-Fin: Tsinghua's Web-Mining Method to Auto-Generate and Backtest Market Hypotheses

AutoHypo‑Fin is an end‑to‑end framework that harvests large‑scale web financial data, extracts entities via large language models, builds a temporal knowledge graph, uses retrieval‑augmented generation and statistical backtesting to automatically create, test, and iteratively optimize trading hypotheses, achieving superior risk‑adjusted returns compared with baseline strategies in experiments from 2019‑2024.

Bighead's Algorithm Notes

Apr 7, 2026

AutoHypo-Fin: Tsinghua's Web-Mining Method to Auto-Generate and Backtest Market Hypotheses

Background : Financial markets are increasingly influenced by massive unstructured web data such as regulatory filings and social media. Traditional hypothesis generation relies on expert knowledge, which is slow, subjective, and inefficient.

Problem Definition : The paper addresses the challenge of automatically generating hypotheses from noisy, heterogeneous web data and rigorously backtesting them to enable continuous optimization over time.

Method :

3.1 Web Data Acquisition : Collects financial data from online sources including EDGAR, GDELT, and platforms like X and Truth Social, followed by preprocessing and normalization.

3.2 Information Extraction & Knowledge Graph Construction : Uses large language models for named‑entity recognition and relation extraction, building a temporal event graph G_t that captures events (e.g., earnings calls, rating changes) linked to assets.

3.3 RAG‑Based Hypothesis Generation : Retrieves relevant context with retrieval‑augmented generation and graph reasoning, then constructs hypotheses in a predefined template specifying trigger conditions, asset sets, timing, and risk controls.

3.4 Backtesting & Statistical Validation : Executes hypothesis backtests on historical market data, computing metrics such as Sharpe ratio, maximum drawdown, and hit rate, and applies statistical tests to assess significance.

3.5 Iterative Optimization : Evaluates backtest results and refines hypotheses using Bayesian optimization or multi‑armed bandit algorithms to adjust parameters like formation window, holding period, and risk limits, forming a closed‑loop feedback system.

Experiments :

4.1 Experimental Setup : Uses market data from Jan 2019 to Dec 2024, comparing four strategies—Market baseline (S&P 500), Sentiment (news‑based), Handcrafted Event (rule‑based), and AutoHypo‑Fin (full pipeline).

4.2 Metrics & Performance : AutoHypo‑Fin (Full) achieves the highest annual return (0.25) and Sharpe ratio (1.56) with a relatively low max drawdown (0.12). The Market baseline’s Sharpe is 0.64 with a drawdown of 0.22. Other metrics (annual volatility, hit rate) also favor AutoHypo‑Fin.

4.3 Hypothesis Pipeline Quality : Testability rate ≈ 80 % of generated hypotheses are testable; after false‑discovery‑rate control (FDR @ 0.1), about 25 % of testable hypotheses are statistically significant.

4.4 Robustness & Statistical Control : Evaluated across bull, bear, and volatile markets; the system remains stable during the 2022 global market downturn. The backtest incorporates transaction costs, execution delays, and market frictions, and uses Benjamini‑Hochberg correction to mitigate overfitting.

Ablation Study : Removing individual components (knowledge graph, RAG, optimizer) degrades performance, confirming each module’s importance.

Conclusion : AutoHypo‑Fin provides a scalable, fully automated solution for generating and testing financial hypotheses, substantially improving risk‑adjusted returns over traditional expert‑driven and simple rule‑based strategies.

LLM Retrieval-Augmented Generation Knowledge Graph quantitative finance AutoHypo-Fin financial backtesting

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.