Artificial Intelligence 6 min read

Why Do Large Language Models Hallucinate? Uncovering the Root Causes and Practical Fixes

The article analyzes why large language models frequently generate confidently wrong answers, attributing hallucinations to statistical inevitability, data scarcity, and limited model expressiveness, and shows how RLHF exacerbates the problem by rewarding guesses, then proposes confidence‑threshold and "I don't know" strategies to mitigate it.

Data Party THU

Sep 14, 2025

Why Do Large Language Models Hallucinate? Uncovering the Root Causes and Practical Fixes

Many users have experienced large language models (LLMs) confidently providing completely incorrect answers to obscure questions, a phenomenon known as hallucination. An example shows a model answering the birthday of Adam Tauman Kalai with three different dates (03‑07, 15‑06, 01‑01), none of which are correct, illustrating typical hallucination.

"Adam Tauman Kalai birthday? If you know, answer DD‑MM." 03-07 , 15-06 , 01-01 – all wrong.

OpenAI’s recent paper (link at the end) systematically reveals that hallucinations stem from the training and evaluation processes rewarding guesses instead of acknowledging uncertainty.

Pre‑training Seeds Hallucination

Even if the pre‑training data is 100% correct, the density‑estimation objective forces the model to generate an answer for every prompt, effectively penalising “I don’t know”. This creates a statistical inevitability of errors.

Post‑training Reinforcement Learning Amplifies the Issue

RLHF introduces a binary scoring scheme (1 for correct, 0 for wrong). Models become over‑confident because they are never allowed to leave a question unanswered; a wrong answer incurs a penalty, while a blank response receives zero.

Root Causes of Hallucination

Statistical Inevitability Viewing generation as a binary "Is‑It‑Valid?" classification shows that any classifier error leads to a generation error (Theorem 1).

Data Scarcity Singleton facts that appear only once in the training corpus are inevitably mis‑remembered; the error rate is at least the proportion of such singletons (Theorem 2).

Insufficient Model Expressiveness If the model architecture cannot capture certain patterns (e.g., trigram counting), the hallucination rate hits a lower bound regardless of data quality (Theorem 3).

Evaluation Practices Punish Uncertainty

Ten mainstream benchmarks (MMLU‑Pro, GPQA, SWE‑bench, WildBench) were re‑evaluated. All of them heavily penalise "I don’t know" responses, effectively rewarding confident guesses and driving models toward hallucination.

Proposed Remedy: Make "Blank Answer" an Option

Instead of forcing a model to answer, introduce a confidence threshold t. The prompt can explicitly state:

"Only answer when your confidence > t; otherwise reply ‘I don’t know’. Wrong answers are penalised by t/(1‑t), and ‘I don’t know’ receives 0 points."

When the model’s true confidence is below t, saying "I don’t know" yields the highest expected score, while fabricating an answer incurs a penalty.

1. Explicit Confidence Threshold

Include a clear rule in the prompt that the model should answer only if its confidence exceeds a predefined value.

2. Make "I don’t know" the Optimal Strategy

By assigning zero points to "I don’t know" and a proportional penalty to wrong answers, the model is incentivised to abstain when uncertain, reducing hallucinations.

These changes shift the post‑training “exam mechanism” from rewarding guesses to encouraging honest uncertainty, thereby mitigating hallucination without needing new benchmarks.

Reference

OpenAI – Why Language Models Hallucinate
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM RLHF Hallucination AISafety ConfidenceThreshold

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.