Why Do Large Language Models Hallucinate? Definitions, Causes, and Mitigation Strategies
This article defines hallucination in LLMs as a failure of faithfulness or factualness, explores data‑level and model‑level origins, reviews reference‑based and reference‑free evaluation metrics, and surveys current research on data‑centric and model‑centric mitigation techniques along with future directions.
Definition of Hallucination
A hallucination occurs when a language model generates text that does not follow the provided input (lack of faithfulness ) or contradicts real‑world knowledge (lack of factualness ).
the generated content that is nonsensical or unfaithful to the provided source content
Different tasks tolerate hallucinations differently. Summarization and data‑to‑text require high faithfulness, while open‑domain dialogue may tolerate more factual errors.
Causes of Hallucination
Data‑Level Issues
Training corpora collected via crowdsourcing or web crawling often contain false statements, causing the model to memorize incorrect facts.
Excessive duplicate data biases the model’s knowledge; deduplication improves factual consistency.
Model‑Level Issues
Architecture : Weak backbones (e.g., RNNs) can amplify hallucinations, though modern LLMs mitigate this.
Decoding : High‑uncertainty sampling methods such as top‑p increase hallucination risk; reducing randomness can improve faithfulness.
Exposure bias : Mismatch between training and inference distributions, especially for long‑form generation, leads to hallucinations.
Parametric knowledge errors : Incorrect facts memorized during pre‑training manifest as hallucinations.
Hallucination Evaluation
Reference‑Based Metrics
Overlap metrics (ROUGE, BLEU) assess faithfulness but cannot evaluate factualness.
Source‑only metrics such as Knowledge F1 attempt to measure hallucination without a target reference.
Reference‑Free Metrics
Information‑Extraction (IE) based : Extract triples from the output and verify them; limited by IE errors and triple‑only knowledge.
Question‑Answering (QA) based : Generate QA pairs from the output, answer them conditioned on the source, and compare; suffers from QA/QG model errors and incomplete world knowledge.
Natural Language Inference (NLI) based : Use NLI models to test whether the source entails the generated text; off‑the‑shelf NLI struggles with world‑knowledge‑required hallucinations.
Factualness classification : Train a classifier on annotated hallucination data to predict hallucination likelihood.
Human evaluation remains the most reliable, sometimes augmented with LLM scoring (e.g., GPT‑4) while acknowledging that LLMs themselves can hallucinate.
Mitigation Strategies
Data‑Centric Approaches
High‑quality dataset construction:
Manual annotation of factuality‑focused datasets (e.g., GO FIGURE, Wiki‑based factuality benchmarks).
Task‑specific training data for LLMs.
Fine‑grained benchmark creation for hallucination analysis.
Automatic filtering:
Score training samples with a model and discard those likely to induce hallucinations.
Weight high‑faithfulness sources (e.g., Wikipedia) more heavily during pre‑training.
Model‑Centric Approaches
Architecture Improvements
Design encoders that better incorporate source information (e.g., graph neural networks).
Reduce randomness in decoding to trade diversity for faithfulness.
Retrieval‑augmented generation (e.g., LLaMA‑Index) significantly reduces hallucinations.
Training Techniques
Controllable text generation: treat hallucination level as a controllable attribute.
Sketch‑to‑content planning before generation.
Reinforcement learning with hallucination‑penalty rewards (e.g., RLHF).
Multi‑task learning to incorporate auxiliary tasks that discourage hallucination.
Post‑processing models that specifically correct hallucinated content.
Future Directions
Metric Design
Develop finer‑grained evaluation distinguishing intrinsic vs. extrinsic hallucinations.
Classify hallucinations by cause (knowledge retrieval error vs. missing knowledge).
Move from sentence‑level to token/phrase‑level assessment.
Build a comprehensive taxonomy of hallucination types.
Knowledge Definition and Editing
Beyond Wikipedia: the internet contains both valuable and false information; methods are needed to detect and edit erroneous memorized facts.
Model‑editing techniques such as ROME and MEMIT can directly correct specific factual errors.
Hallucination Removal
Retrieval‑augmented generation.
Reinforcement learning from human feedback (RLHF).
Knowledge injection and grounding.
LLM‑Specific Research
Evaluation Benchmarks
TruthfulQA : 817 handcrafted questions testing factual answering. Top LLMs answer truthfully only ~58% of the time. Paper: https://aclanthology.org/2022.acl-long.229/
HaluEval : 35 k annotated examples covering QA, knowledge‑grounded dialogue, and summarization. Shows strong LLMs still struggle to detect hallucinations. Paper: https://arxiv.org/pdf/2305.11747.pdf
Retrieval‑augmented evaluation studies demonstrate that smaller fine‑tuned models can outperform larger zero‑shot models on attribution tasks.
Detection and Repair
Zero‑resource detection methods (e.g., SelfCheckGPT) treat the model as a black box.
Factuality‑enhanced language models for open‑ended generation.
Self‑critique frameworks (CRITIC) and multi‑agent debate approaches aim to improve factual consistency.
Post‑generation correction methods such as contrastive candidate selection and denoising (PURR).
Key Papers and Resources
OpenAI perspective on hallucinations: https://www.youtube.com/watch?v=hhiLw5Q_UFg
John Schulman on RLHF challenges: https://zhuanlan.zhihu.com/p/640144131
Adaptive Chameleon or Stubborn Sloth (knowledge clashes): https://arxiv.org/abs/2305.13300
Multitask, Multilingual, Multimodal Evaluation of ChatGPT: https://arxiv.org/abs/2302.0402Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
