FinSentLLM: A Multi‑LLM Framework for Financial Sentiment Prediction

FinSentLLM integrates multiple LLM experts with structured financial semantic signals, achieving 3‑6% higher accuracy and F1 on the Financial PhraseBank compared to baselines, while DCC‑GARCH and Johansen cointegration analyses confirm a statistically significant long‑term co‑movement between the predicted sentiment signals and stock market dynamics.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
FinSentLLM: A Multi‑LLM Framework for Financial Sentiment Prediction

Background

Financial sentiment analysis (FSA) has evolved through three stages: early lexical‑based methods, pre‑trained models such as FinBERT that achieve strong benchmark results, and recent attempts to link sentiment signals with market behavior, which largely ignore modern LLM capabilities.

Problem Definition

Validate whether LLM‑derived sentiment signals are consistent with actual market movements.

Combine domain‑specific LLMs (FinBERT) with general‑purpose LLMs (RoBERTa‑sentiment) and structured financial semantic cues to improve prediction robustness.

Use econometric techniques (DCC‑GARCH, Johansen cointegration) to test long‑term co‑movement between sentiment and stock indices.

Method

FinSentLLM is a lightweight, no‑fine‑tuning integration framework consisting of three components:

Multi‑LLM expert panel : FinBERT (domain‑specific) and RoBERTa‑sentiment (trained on 58 M tweets) provide posterior probability distributions over positive, neutral, and negative classes.

Financial sentiment signal design :

Probability‑derived features: logits, maximum probability, margin, entropy.

Structured semantic flags extracted from domain patterns (e.g., “profit rose”, “loss narrowed”, “agreement”, “restructuring”).

Expert disagreement metrics: L1 distance and KL divergence between FinBERT and RoBERTa‑sentiment distributions.

Meta‑classifier : A lightweight classifier (logistic regression or XGBoost) aggregates the heterogeneous signals. XGBoost is preferred for modeling non‑linear interactions.

The overall architecture is illustrated in the figure below.

FinSentLLM architecture
FinSentLLM architecture

Design Advantages

Complementarity: domain‑specific and general LLMs leverage each other’s strengths.

Domain knowledge injection via structured semantic flags enhances financial reasoning.

Efficiency: No large‑scale fine‑tuning required.

Scalability: New LLM experts can be added easily.

Experiments

Datasets & Pre‑processing

Financial PhraseBank (FPB): 14,780 news headlines with three‑class sentiment labels, split by annotation consistency (50 %–100 %).

FNSPID: 15,698,563 news items (1999‑2023) aligned with Yahoo Finance daily closing‑price log returns; FinBERT computes daily sentiment scores.

All features are Z‑score normalized and aligned with daily market indices.

Baselines & Configuration

Domain models: FinBERT, RoBERTa‑sentiment.

General LLMs: GPT‑4o mini, GPT‑5 (zero‑shot).

Meta‑classifiers: Logistic regression, XGBoost (hyper‑parameter tuned).

Evaluation: 5‑fold cross‑validation.

Main Results

Sentiment classification : RoBERTa‑sentiment achieves 66‑71 % accuracy, Macro‑F1≈0.55; FinBERT 88‑97 % accuracy, Macro‑F1 0.88‑0.96; GPT‑4o mini 83‑95 % accuracy, Macro‑F1 0.82‑0.94; GPT‑5 slightly weaker on short texts.

FinSentLLM performance : Using XGBoost, accuracy improves to 91.0 % (Macro‑F1 0.904) on the noisy 50 % subset, 98.5 % (Macro‑F1 0.979) on the strict 100 % subset, and 98.2 % (Macro‑F1 0.982) on the full set—outperforming FinBERT by 2‑5.8 % and GPT‑4o mini by 6‑10.3 %.

Ablation Study

Removing RoBERTa‑sentiment drops overall accuracy from 98.2 % to 97.6 %, confirming the complementary role of the general LLM. Excluding structured semantic signals reduces Macro‑F1 from 0.982 to 0.981, showing the benefit of domain‑specific flags.

Ablation results
Ablation results

Sentiment‑Market Co‑movement Analysis

DCC‑GARCH

Parameters: α (short‑term impact) 0.02‑0.06, β (persistence) > 0.90, average ρ 0.35‑0.45 (e.g., S&P 500 ρ = 0.4044). Low α and high β indicate a sustained short‑term positive correlation between sentiment and market returns.

DCC-GARCH results
DCC-GARCH results

Johansen Cointegration Test

Both sentiment scores and log‑price series are I(1). The test rejects the null of no cointegration (H₀: r = 0) but does not reject up to one cointegrating relationship (H₀: r ≤ 1), indicating a single long‑run equilibrium between sentiment and the market.

Johansen test
Johansen test

Conclusion

FinSentLLM demonstrates that a lightweight integration of multiple LLM experts, probability‑derived features, and structured financial semantic flags can substantially improve sentiment classification and, importantly, produce sentiment signals that exhibit statistically significant long‑term co‑movement with stock market dynamics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMXGBoostDCC-GARCHFinancial Sentiment AnalysisFinSentLLMJohansen CointegrationMeta-classifier
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.