Bighead's Algorithm Notes
Sep 11, 2025 · Artificial Intelligence
Fin-PRM: Alibaba’s Dianjin Team Introduces a Domain-Specific Process Reward Model for Financial Reasoning
Fin‑PRM, a domain‑specific process reward model for financial reasoning introduced by Alibaba’s Dianjin team, employs dual‑level step and trajectory rewards to provide fine‑grained supervision, achieving up to 12.9% accuracy gains in supervised fine‑tuning and 5.1% improvements in Best‑of‑N inference on benchmarks such as CFLUE and FinQA.
CFLUEFin-PRMFinQA
0 likes · 11 min read
