Why Do Large Language Models Hallucinate? Uncovering the Root Causes and Practical Fixes

The article analyzes why large language models frequently generate confidently wrong answers, attributing hallucinations to statistical inevitability, data scarcity, and limited model expressiveness, and shows how RLHF exacerbates the problem by rewarding guesses, then proposes confidence‑threshold and "I don't know" strategies to mitigate it.

Data Party THU
Data Party THU
Data Party THU
Why Do Large Language Models Hallucinate? Uncovering the Root Causes and Practical Fixes

Many users have experienced large language models (LLMs) confidently providing completely incorrect answers to obscure questions, a phenomenon known as hallucination. An example shows a model answering the birthday of Adam Tauman Kalai with three different dates (03‑07, 15‑06, 01‑01), none of which are correct, illustrating typical hallucination.

"Adam Tauman Kalai birthday? If you know, answer DD‑MM." 03-07 , 15-06 , 01-01 – all wrong.

OpenAI’s recent paper (link at the end) systematically reveals that hallucinations stem from the training and evaluation processes rewarding guesses instead of acknowledging uncertainty.

Pre‑training Seeds Hallucination

Even if the pre‑training data is 100% correct, the density‑estimation objective forces the model to generate an answer for every prompt, effectively penalising “I don’t know”. This creates a statistical inevitability of errors.

Post‑training Reinforcement Learning Amplifies the Issue

RLHF introduces a binary scoring scheme (1 for correct, 0 for wrong). Models become over‑confident because they are never allowed to leave a question unanswered; a wrong answer incurs a penalty, while a blank response receives zero.

Root Causes of Hallucination

Statistical Inevitability Viewing generation as a binary "Is‑It‑Valid?" classification shows that any classifier error leads to a generation error (Theorem 1).

Statistical inevitability diagram
Statistical inevitability diagram

Data Scarcity Singleton facts that appear only once in the training corpus are inevitably mis‑remembered; the error rate is at least the proportion of such singletons (Theorem 2).

Data scarcity illustration
Data scarcity illustration

Insufficient Model Expressiveness If the model architecture cannot capture certain patterns (e.g., trigram counting), the hallucination rate hits a lower bound regardless of data quality (Theorem 3).

Model expressiveness diagram
Model expressiveness diagram

Evaluation Practices Punish Uncertainty

Ten mainstream benchmarks (MMLU‑Pro, GPQA, SWE‑bench, WildBench) were re‑evaluated. All of them heavily penalise "I don’t know" responses, effectively rewarding confident guesses and driving models toward hallucination.

Evaluation penalty illustration
Evaluation penalty illustration

Proposed Remedy: Make "Blank Answer" an Option

Instead of forcing a model to answer, introduce a confidence threshold t. The prompt can explicitly state:

"Only answer when your confidence > t; otherwise reply ‘I don’t know’. Wrong answers are penalised by t/(1‑t), and ‘I don’t know’ receives 0 points."

When the model’s true confidence is below t, saying "I don’t know" yields the highest expected score, while fabricating an answer incurs a penalty.

1. Explicit Confidence Threshold

Include a clear rule in the prompt that the model should answer only if its confidence exceeds a predefined value.

2. Make "I don’t know" the Optimal Strategy

By assigning zero points to "I don’t know" and a proportional penalty to wrong answers, the model is incentivised to abstain when uncertain, reducing hallucinations.

These changes shift the post‑training “exam mechanism” from rewarding guesses to encouraging honest uncertainty, thereby mitigating hallucination without needing new benchmarks.

Reference

OpenAI – Why Language Models Hallucinate
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMRLHFhallucinationAISafetyConfidenceThreshold
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.