Artificial Intelligence 10 min read

FinZero: Multimodal Large‑Model Reasoning for Financial Time‑Series Forecasting

FinZero is a multimodal large‑model that leverages a 30‑billion‑parameter Qwen2.5‑VL backbone fine‑tuned with the UARPO strategy on the FVLDB dataset, enabling accurate financial time‑series prediction, uncertainty quantification, and outperforming larger models such as GPT‑4o by about 13.5% in high‑confidence groups.

Bighead's Algorithm Notes

Oct 2, 2025

FinZero: Multimodal Large‑Model Reasoning for Financial Time‑Series Forecasting

Background

Financial time‑series forecasting is crucial but challenging because market dynamics are affected by macro‑ and micro‑level factors and because once a pattern is exploited it quickly loses predictive power.

Problem Definition

The work targets four core challenges: (1) information loss caused by standardizing time‑series before modeling; (2) limited scalability due to fixed window sizes or variable counts; (3) under‑utilization of large‑model reasoning capabilities for time‑series tasks; (4) lack of interpretability and uncertainty quantification in predictions.

Method

FinZero Model

FinZero is a multimodal large model for financial time‑series forecasting. It uses a 30‑billion‑parameter multimodal backbone (e.g., Qwen2.5‑VL‑3B) and is fine‑tuned on the FVLDB dataset with the Uncertainty‑Adjusted Relative‑Policy Optimization (UARPO) method, enabling inference, prediction, and explicit uncertainty analysis.

UARPO

UARPO improves the original Group‑Relative Policy Optimization (GRPO) by adding Intra‑Group Relative Advantage (IGRA), Cross‑Group Relative Advantage (CGRA), and Uncertainty‑Adjusted Relative Advantage (UARA) to address non‑stationarity and uncertainty in financial series.

Optimization Objective

The objective involves current and old policy models π_{θ} and π_{θ_{old}}, sampled problems q and outputs o_i, and reward vector r=[r_0, r_1, …, r_G]. IGRA is defined as:

CGRA is defined as:

The uncertainty adjustment is:

Here α is a tunable coefficient and score is the model’s confidence score.

Algorithm Flow

The iterative UARPO process is illustrated below:

Reward and Uncertainty Design

Accuracy reward: measures consistency between predicted and actual up/down movements.

Completion‑length reward: encourages longer reasoning text (increasing up to 200 tokens).

Format reward: constrains the model to output the required format.

Confidence score: the model outputs a confidence score that quantifies prediction uncertainty for risk assessment.

Dataset

FVLDB contains over 10,000 financial time‑series image‑text pairs covering global stock indices, Bitcoin, and other assets. Diversity includes asset types, prediction tasks (price, volatility), sequence lengths, frequencies, indicator types, and chart styles (candlestick, technical‑indicator charts).

Experimental Setup

Backbone: Qwen2.5‑VL‑3B (30 B parameters). Baselines: original Qwen2.5‑VL‑3B, Qwen2.5‑VL‑7B (70 B), GPT‑4o, GRPO‑fine‑tuned Qwen2.5‑VL‑3B, and a naive historical‑trend model. Training uses Adam optimizer with a learning rate of 1e‑6, two epochs of fine‑tuning, and runs on two 80 GB Nvidia A100 GPUs.

Results

Training Process

During UARPO fine‑tuning, format and completion‑length rewards rise quickly and stabilize; accuracy reward continues to grow; loss steadily declines, indicating the model learns the target format, reasoning depth, and predictive ability.

Prediction Performance

FinZero (30 B) surpasses larger models such as GPT‑4o and Qwen2.5‑VL‑7B in average accuracy on both volatility and price prediction tasks, confirming the effectiveness of UARPO fine‑tuning.

Confidence‑Group Analysis

FinZero’s high‑confidence group improves accuracy by approximately 13.5 % over GPT‑4o, and confidence scores exhibit a strong positive correlation with accuracy, demonstrating reliable uncertainty quantification for trustworthy financial decisions.

Fine‑tuning Trend Comparison

The accuracy growth curve of FinZero during fine‑tuning is markedly steeper than that of other models, further validating the advantage of the UARPO method.