FinSentLLM: A Multi‑LLM Framework for Financial Sentiment Prediction
FinSentLLM integrates multiple LLM experts with structured financial semantic signals, achieving 3‑6% higher accuracy and F1 on the Financial PhraseBank compared to baselines, while DCC‑GARCH and Johansen cointegration analyses confirm a statistically significant long‑term co‑movement between the predicted sentiment signals and stock market dynamics.
Background
Financial sentiment analysis (FSA) has evolved through three stages: early lexical‑based methods, pre‑trained models such as FinBERT that achieve strong benchmark results, and recent attempts to link sentiment signals with market behavior, which largely ignore modern LLM capabilities.
Problem Definition
Validate whether LLM‑derived sentiment signals are consistent with actual market movements.
Combine domain‑specific LLMs (FinBERT) with general‑purpose LLMs (RoBERTa‑sentiment) and structured financial semantic cues to improve prediction robustness.
Use econometric techniques (DCC‑GARCH, Johansen cointegration) to test long‑term co‑movement between sentiment and stock indices.
Method
FinSentLLM is a lightweight, no‑fine‑tuning integration framework consisting of three components:
Multi‑LLM expert panel : FinBERT (domain‑specific) and RoBERTa‑sentiment (trained on 58 M tweets) provide posterior probability distributions over positive, neutral, and negative classes.
Financial sentiment signal design :
Probability‑derived features: logits, maximum probability, margin, entropy.
Structured semantic flags extracted from domain patterns (e.g., “profit rose”, “loss narrowed”, “agreement”, “restructuring”).
Expert disagreement metrics: L1 distance and KL divergence between FinBERT and RoBERTa‑sentiment distributions.
Meta‑classifier : A lightweight classifier (logistic regression or XGBoost) aggregates the heterogeneous signals. XGBoost is preferred for modeling non‑linear interactions.
The overall architecture is illustrated in the figure below.
Design Advantages
Complementarity: domain‑specific and general LLMs leverage each other’s strengths.
Domain knowledge injection via structured semantic flags enhances financial reasoning.
Efficiency: No large‑scale fine‑tuning required.
Scalability: New LLM experts can be added easily.
Experiments
Datasets & Pre‑processing
Financial PhraseBank (FPB): 14,780 news headlines with three‑class sentiment labels, split by annotation consistency (50 %–100 %).
FNSPID: 15,698,563 news items (1999‑2023) aligned with Yahoo Finance daily closing‑price log returns; FinBERT computes daily sentiment scores.
All features are Z‑score normalized and aligned with daily market indices.
Baselines & Configuration
Domain models: FinBERT, RoBERTa‑sentiment.
General LLMs: GPT‑4o mini, GPT‑5 (zero‑shot).
Meta‑classifiers: Logistic regression, XGBoost (hyper‑parameter tuned).
Evaluation: 5‑fold cross‑validation.
Main Results
Sentiment classification : RoBERTa‑sentiment achieves 66‑71 % accuracy, Macro‑F1≈0.55; FinBERT 88‑97 % accuracy, Macro‑F1 0.88‑0.96; GPT‑4o mini 83‑95 % accuracy, Macro‑F1 0.82‑0.94; GPT‑5 slightly weaker on short texts.
FinSentLLM performance : Using XGBoost, accuracy improves to 91.0 % (Macro‑F1 0.904) on the noisy 50 % subset, 98.5 % (Macro‑F1 0.979) on the strict 100 % subset, and 98.2 % (Macro‑F1 0.982) on the full set—outperforming FinBERT by 2‑5.8 % and GPT‑4o mini by 6‑10.3 %.
Ablation Study
Removing RoBERTa‑sentiment drops overall accuracy from 98.2 % to 97.6 %, confirming the complementary role of the general LLM. Excluding structured semantic signals reduces Macro‑F1 from 0.982 to 0.981, showing the benefit of domain‑specific flags.
Sentiment‑Market Co‑movement Analysis
DCC‑GARCH
Parameters: α (short‑term impact) 0.02‑0.06, β (persistence) > 0.90, average ρ 0.35‑0.45 (e.g., S&P 500 ρ = 0.4044). Low α and high β indicate a sustained short‑term positive correlation between sentiment and market returns.
Johansen Cointegration Test
Both sentiment scores and log‑price series are I(1). The test rejects the null of no cointegration (H₀: r = 0) but does not reject up to one cointegrating relationship (H₀: r ≤ 1), indicating a single long‑run equilibrium between sentiment and the market.
Conclusion
FinSentLLM demonstrates that a lightweight integration of multiple LLM experts, probability‑derived features, and structured financial semantic flags can substantially improve sentiment classification and, importantly, produce sentiment signals that exhibit statistically significant long‑term co‑movement with stock market dynamics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
