Artificial Intelligence 16 min read

Why Do Large Language Models Hallucinate? Definitions, Causes, and Mitigation Strategies

This article defines hallucination in LLMs as a failure of faithfulness or factualness, explores data‑level and model‑level origins, reviews reference‑based and reference‑free evaluation metrics, and surveys current research on data‑centric and model‑centric mitigation techniques along with future directions.

Baobao Algorithm Notes

Aug 22, 2023

Why Do Large Language Models Hallucinate? Definitions, Causes, and Mitigation Strategies

Definition of Hallucination

A hallucination occurs when a language model generates text that does not follow the provided input (lack of faithfulness ) or contradicts real‑world knowledge (lack of factualness ).

the generated content that is nonsensical or unfaithful to the provided source content

Different tasks tolerate hallucinations differently. Summarization and data‑to‑text require high faithfulness, while open‑domain dialogue may tolerate more factual errors.

Causes of Hallucination

Data‑Level Issues

Training corpora collected via crowdsourcing or web crawling often contain false statements, causing the model to memorize incorrect facts.

Excessive duplicate data biases the model’s knowledge; deduplication improves factual consistency.

Model‑Level Issues

Architecture : Weak backbones (e.g., RNNs) can amplify hallucinations, though modern LLMs mitigate this.

Decoding : High‑uncertainty sampling methods such as top‑p increase hallucination risk; reducing randomness can improve faithfulness.

Exposure bias : Mismatch between training and inference distributions, especially for long‑form generation, leads to hallucinations.

Parametric knowledge errors : Incorrect facts memorized during pre‑training manifest as hallucinations.

Hallucination Evaluation

Reference‑Based Metrics

Overlap metrics (ROUGE, BLEU) assess faithfulness but cannot evaluate factualness.

Source‑only metrics such as Knowledge F1 attempt to measure hallucination without a target reference.

Reference‑Free Metrics

Information‑Extraction (IE) based : Extract triples from the output and verify them; limited by IE errors and triple‑only knowledge.

Question‑Answering (QA) based : Generate QA pairs from the output, answer them conditioned on the source, and compare; suffers from QA/QG model errors and incomplete world knowledge.

Natural Language Inference (NLI) based : Use NLI models to test whether the source entails the generated text; off‑the‑shelf NLI struggles with world‑knowledge‑required hallucinations.

Factualness classification : Train a classifier on annotated hallucination data to predict hallucination likelihood.

Human evaluation remains the most reliable, sometimes augmented with LLM scoring (e.g., GPT‑4) while acknowledging that LLMs themselves can hallucinate.

Mitigation Strategies

Data‑Centric Approaches

High‑quality dataset construction:

Manual annotation of factuality‑focused datasets (e.g., GO FIGURE, Wiki‑based factuality benchmarks).

Task‑specific training data for LLMs.

Fine‑grained benchmark creation for hallucination analysis.

Automatic filtering:

Score training samples with a model and discard those likely to induce hallucinations.

Weight high‑faithfulness sources (e.g., Wikipedia) more heavily during pre‑training.

Model‑Centric Approaches

Architecture Improvements

Design encoders that better incorporate source information (e.g., graph neural networks).

Reduce randomness in decoding to trade diversity for faithfulness.

Retrieval‑augmented generation (e.g., LLaMA‑Index) significantly reduces hallucinations.

Training Techniques

Controllable text generation: treat hallucination level as a controllable attribute.

Sketch‑to‑content planning before generation.

Reinforcement learning with hallucination‑penalty rewards (e.g., RLHF).

Multi‑task learning to incorporate auxiliary tasks that discourage hallucination.

Post‑processing models that specifically correct hallucinated content.

Future Directions

Metric Design

Develop finer‑grained evaluation distinguishing intrinsic vs. extrinsic hallucinations.

Classify hallucinations by cause (knowledge retrieval error vs. missing knowledge).

Move from sentence‑level to token/phrase‑level assessment.

Build a comprehensive taxonomy of hallucination types.

Knowledge Definition and Editing

Beyond Wikipedia: the internet contains both valuable and false information; methods are needed to detect and edit erroneous memorized facts.

Model‑editing techniques such as ROME and MEMIT can directly correct specific factual errors.

Hallucination Removal

Retrieval‑augmented generation.

Reinforcement learning from human feedback (RLHF).

Knowledge injection and grounding.

LLM‑Specific Research

Evaluation Benchmarks

TruthfulQA : 817 handcrafted questions testing factual answering. Top LLMs answer truthfully only ~58% of the time. Paper: https://aclanthology.org/2022.acl-long.229/

HaluEval : 35 k annotated examples covering QA, knowledge‑grounded dialogue, and summarization. Shows strong LLMs still struggle to detect hallucinations. Paper: https://arxiv.org/pdf/2305.11747.pdf

Retrieval‑augmented evaluation studies demonstrate that smaller fine‑tuned models can outperform larger zero‑shot models on attribution tasks.

Detection and Repair

Zero‑resource detection methods (e.g., SelfCheckGPT) treat the model as a black box.

Factuality‑enhanced language models for open‑ended generation.

Self‑critique frameworks (CRITIC) and multi‑agent debate approaches aim to improve factual consistency.

Post‑generation correction methods such as contrastive candidate selection and denoising (PURR).

Key Papers and Resources

OpenAI perspective on hallucinations: https://www.youtube.com/watch?v=hhiLw5Q_UFg
John Schulman on RLHF challenges: https://zhuanlan.zhihu.com/p/640144131
Adaptive Chameleon or Stubborn Sloth (knowledge clashes): https://arxiv.org/abs/2305.13300
Multitask, Multilingual, Multimodal Evaluation of ChatGPT: https://arxiv.org/abs/2302.0402

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Evaluation mitigation Hallucination factuality

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.