Artificial Intelligence 9 min read

How QuCo‑RAG Replaces Model Confidence with Objective Evidence to Cut Hallucinations

QuCo‑RAG introduces a dynamic retrieval‑augmented generation framework that quantifies uncertainty using pre‑training corpus statistics, replacing unreliable model confidence with objective frequency and co‑occurrence evidence, achieving millisecond‑level hallucination detection, superior multi‑hop QA performance, and cross‑model transferability across various LLMs.

PaperAgent

Jan 5, 2026

How QuCo‑RAG Replaces Model Confidence with Objective Evidence to Cut Hallucinations

Background and Motivation

Large language models (LLMs) increasingly suffer from hallucinations—confidently generated but factually incorrect content. Existing dynamic RAG methods rely on internal signals such as probability, entropy, or attention, which are themselves unreliable because LLMs are poorly calibrated.

Core Pain Points

Internal uncertainty signals are not trustworthy, leading to "confident hallucinations".

Static RAG strategies cannot handle the emergent information needs of multi‑hop questions.

Current methods lack effective verification for long‑tail knowledge and inter‑entity relationships.

QuCo‑RAG Overview

QuCo‑RAG (Quantifying Uncertainty via Pre‑training Corpus for Dynamic RAG) replaces subjective confidence with objective statistical evidence mined from the pre‑training corpus. By checking entity frequency and co‑occurrence counts, it triggers retrieval only when the evidence is missing, enabling millisecond‑level hallucination detection and dynamic retrieval.

Core Idea: Two “Zero” Triggers

The system operates in two stages:

Pre‑generation check : If a question contains a low‑frequency entity (average frequency < 1k in the corpus), an Infini‑gram lookup retrieves relevant documents before generation.

Runtime verification : During generation, each extracted triple (head, relation, tail) is checked; a zero co‑occurrence (head and tail never appear together) flags a hallucination risk, triggering an immediate re‑retrieval and sentence rewrite.

System Workflow

Pre‑generation knowledge assessment : Extract entities → query frequency → low‑frequency triggers retrieval → inject documents into context.

Runtime declaration verification : After each generated sentence, extract three‑tuples → check head‑tail co‑occurrence → zero co‑occurrence = hallucination risk → trigger retrieval and rewrite.

Millisecond‑level query engine : An Infini‑gram suffix‑array index over 4 trillion tokens supports sub‑millisecond n‑gram and co‑occurrence queries, making the overhead negligible.

Experimental Results

QuCo‑RAG consistently outperforms strong baselines on multi‑hop QA datasets. For example, on OLMo‑2‑7B with the 2WikiMultiHopQA dataset, the baseline EM is 25.3 while QuCo‑RAG achieves 32.7 (+7.4). Similar gains of +10–12 EM points are observed on larger models and HotpotQA.

Across four different LLMs (OLMo‑2‑7B/13B/32B, Qwen2.5‑32B, Llama‑3‑8B, GPT‑4.1, GPT‑5‑chat), QuCo‑RAG improves exact‑match scores by 4–14 points, demonstrating stable superiority over internal‑signal methods such as DRAGIN, ETC, and SeaKR, which often fluctuate or are outperformed by simple baselines.

Deep Analysis

Ablation : Removing runtime verification drops performance by 5.1 EM; removing pre‑generation checks drops 2.5 EM, confirming their complementary effect.

Domain Generalization : On PubMedQA, QuCo‑RAG reaches 66.4% accuracy (+11.2 over the best baseline) while internal‑signal methods either over‑retrieve or fail to retrieve.

Entity Frequency Stratification : For low‑frequency entities (< 50 occurrences), QuCo‑RAG leads Wo‑RAG by 10–17 EM points, whereas internal‑signal methods perform almost on par with Wo‑RAG, indicating they cannot recognize their own ignorance.

Case Study: DRAGIN vs. QuCo‑RAG

Question: "Who is the mother of the director of film Polish‑Russian War?"

DRAGIN output: Generates a wrong director and mother, resulting in a completely incorrect answer because its internal confidence mistakenly marks the fabricated director as low‑uncertainty.

QuCo‑RAG output: ① Pre‑check discovers the rare entity "Polish‑Russian War" and retrieves the correct director "Xawery Żuławski". ② Runtime check finds zero co‑occurrence for "Xawery Żuławski + mother", triggers a second retrieval and correctly answers "Małgorzata Braunek". Both steps rely on objective corpus evidence rather than model self‑confidence.

Limitations and Future Work

Synonyms/aliases : May cause false zero‑co‑occurrence triggers; a conservative strategy (extra retrieval) mitigates this.

Static corpus : Cannot cover facts emerging after 2025; periodic rebuilding of the Infini‑gram index is required.

Future directions : Incorporate entity linking and canonical names, use time‑stamped dynamic corpora, and develop incremental indexing for continuous updates.

Conclusion

QuCo‑RAG leverages the simplest statistics—entity frequency and co‑occurrence counts—from a 4 trillion‑token pre‑training corpus to provide an explainable, transferable, and deployable "objective evidence chain" for dynamic RAG, effectively replacing unreliable model confidence.

Stop trusting a model's internal "confidence"; let real corpus evidence tell it when to look up facts and when to stop hallucinating.

Resources

QuCo‑RAG: Quantifying Uncertainty from the Pre‑training Corpus for Dynamic Retrieval‑Augmented Generation
https://arxiv.org/pdf/2512.19134
https://github.com/ZhishanQ/QuCo-RAG