How Veri‑R1 Enables Reliable Online Fact‑Checking for Large Language Models

This article introduces Veri‑R1, a novel training framework that equips large language models with online claim verification capabilities by combining reinforcement learning, fine‑grained reward design, and high‑quality data filtering, and demonstrates its superior performance across multiple fact‑checking benchmarks.

Data Party THU
Data Party THU
Data Party THU
How Veri‑R1 Enables Reliable Online Fact‑Checking for Large Language Models

Online Claim Verification

Fact‑checking at web scale requires models to actively retrieve evidence, assess its credibility, and reason over multiple steps. Traditional "offline" verification provides the evidence together with the claim, while real‑world scenarios demand an "online" approach where the model must search for and cite evidence before forming a conclusion.

Veri‑R1 Training Framework

Veri‑R1 introduces an online claim verification paradigm for large language models. The system learns by interacting with a search engine, receiving reward feedback, and iteratively improving its verification pipeline.

Paper link: https://arxiv.org/abs/2510.01932

Code repository: https://github.com/H0key-22/Veri-R1

Figure 1: Offline vs. online verification analogy
Figure 1: Offline vs. online verification analogy

Figure 1: Offline verification is a closed‑book exam where evidence is supplied; online verification is an open‑book exam where the model must retrieve evidence before reasoning.

Fine‑grained Reward Mechanisms

Label reward : encourages correct claim classification (support, refute, insufficient evidence).

Evidence reward : requires the model to locate and cite genuine evidence rather than guessing.

Format reward : enforces a strict output schema to keep responses consistent.

Effectiveness weight : quantifies the amount of valid evidence used, preventing shortcuts that rely on partial citations.

High‑Quality Training Data

Training samples are filtered with GPT‑4o: correctly labeled instances are retained, while ambiguous or erroneous examples are discarded. This reduces noise and focuses learning on reliable claim‑evidence associations.

Figure 2: Veri‑R1 framework overview
Figure 2: Veri‑R1 framework overview

Figure 2: The loop consists of claim assessment, evidence retrieval, reasoning, and reward‑based feedback.

Experimental Evaluation

Veri‑R1 was evaluated on five mainstream verification benchmarks: FEVEROUS, EX‑FEVER, FEVER, HOVER, and SciFACT. Models trained with the online‑RL version consistently achieved the highest joint accuracy, demonstrating superior robustness and generalization compared with pure supervised fine‑tuning.

Evaluation metrics

Joint Accuracy : both the predicted label and all retrieved evidence must be completely correct.

Verification Accuracy : the label is correct and the model retrieves all required evidence (extra evidence is allowed).

Label Accuracy : only the correctness of the predicted label is measured.

Key findings:

Online RL excels at multi‑hop reasoning, entity disambiguation, and numeric inference.

3B‑parameter Qwen and Llama models trained with Veri‑R1 rivaled GPT‑4o, challenging the notion that larger scale alone determines performance.

Figure 3: Performance comparison across datasets
Figure 3: Performance comparison across datasets

Figure 3: Veri‑R1’s advantage is most pronounced in joint accuracy.

Offline vs. Online Reinforcement Learning

Offline RL typically performs coarse keyword searches, often returning fragmented or irrelevant evidence that leads to incorrect conclusions. Online RL decomposes a claim into sub‑claims, designs targeted queries for each, retrieves precise evidence, and integrates the results, yielding accurate, well‑supported answers.

Figure 4: Offline vs. online RL verification examples
Figure 4: Offline vs. online RL verification examples

Figure 4: Example comparison shows offline RL’s vague evidence retrieval versus online RL’s precise, step‑wise verification.

Future Directions

Current models operate on static corpora. Extending Veri‑R1 with open‑world web access and real‑time knowledge bases could enable truly autonomous AI fact‑checkers. The framework demonstrates that integrating retrieval, reasoning, and evidence‑grounded answering leads to more realistic and reliable verification systems.

Code example

来源:深度学习自然语言处理
本文
约2500字
,建议阅读
5
分钟
本文介绍了大模型的查证真伪。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIFact CheckingOnline VerificationVeri-R1
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.