Dec 6, 2025 · Artificial Intelligence

FinEval‑KR: Diagnosing Knowledge vs. Reasoning Gaps in Financial Large Language Models

FinEval‑KR, a new EMNLP2025 evaluation framework co‑authored by Shanghai University of Finance and Economics and Ant Group, separates knowledge coverage from logical reasoning to reveal why financial LLMs often hallucinate on calculation tasks, introduces KS, RS, and CS metrics, and ranks 18 state‑of‑the‑art models on a rigorously curated finance dataset.

Knowledge vs reasoningLLM evaluationfinance AI

0 likes · 14 min read

FinEval‑KR: Diagnosing Knowledge vs. Reasoning Gaps in Financial Large Language Models