Why Do Large Language Models Hallucinate? Uncovering the Root Causes and Practical Fixes
The article analyzes why large language models frequently generate confidently wrong answers, attributing hallucinations to statistical inevitability, data scarcity, and limited model expressiveness, and shows how RLHF exacerbates the problem by rewarding guesses, then proposes confidence‑threshold and "I don't know" strategies to mitigate it.
Many users have experienced large language models (LLMs) confidently providing completely incorrect answers to obscure questions, a phenomenon known as hallucination. An example shows a model answering the birthday of Adam Tauman Kalai with three different dates (03‑07, 15‑06, 01‑01), none of which are correct, illustrating typical hallucination.
"Adam Tauman Kalai birthday? If you know, answer DD‑MM." 03-07 , 15-06 , 01-01 – all wrong.
OpenAI’s recent paper (link at the end) systematically reveals that hallucinations stem from the training and evaluation processes rewarding guesses instead of acknowledging uncertainty.
Pre‑training Seeds Hallucination
Even if the pre‑training data is 100% correct, the density‑estimation objective forces the model to generate an answer for every prompt, effectively penalising “I don’t know”. This creates a statistical inevitability of errors.
Post‑training Reinforcement Learning Amplifies the Issue
RLHF introduces a binary scoring scheme (1 for correct, 0 for wrong). Models become over‑confident because they are never allowed to leave a question unanswered; a wrong answer incurs a penalty, while a blank response receives zero.
Root Causes of Hallucination
Statistical Inevitability Viewing generation as a binary "Is‑It‑Valid?" classification shows that any classifier error leads to a generation error (Theorem 1).
Data Scarcity Singleton facts that appear only once in the training corpus are inevitably mis‑remembered; the error rate is at least the proportion of such singletons (Theorem 2).
Insufficient Model Expressiveness If the model architecture cannot capture certain patterns (e.g., trigram counting), the hallucination rate hits a lower bound regardless of data quality (Theorem 3).
Evaluation Practices Punish Uncertainty
Ten mainstream benchmarks (MMLU‑Pro, GPQA, SWE‑bench, WildBench) were re‑evaluated. All of them heavily penalise "I don’t know" responses, effectively rewarding confident guesses and driving models toward hallucination.
Proposed Remedy: Make "Blank Answer" an Option
Instead of forcing a model to answer, introduce a confidence threshold t. The prompt can explicitly state:
"Only answer when your confidence > t; otherwise reply ‘I don’t know’. Wrong answers are penalised by t/(1‑t), and ‘I don’t know’ receives 0 points."
When the model’s true confidence is below t, saying "I don’t know" yields the highest expected score, while fabricating an answer incurs a penalty.
1. Explicit Confidence Threshold
Include a clear rule in the prompt that the model should answer only if its confidence exceeds a predefined value.
2. Make "I don’t know" the Optimal Strategy
By assigning zero points to "I don’t know" and a proportional penalty to wrong answers, the model is incentivised to abstain when uncertain, reducing hallucinations.
These changes shift the post‑training “exam mechanism” from rewarding guesses to encouraging honest uncertainty, thereby mitigating hallucination without needing new benchmarks.
Reference
OpenAI – Why Language Models Hallucinate
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdfSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
