Why LLMs Hallucinate and How to Mitigate the Problem
The article explains that hallucinations in large language models stem mainly from the supervised fine‑tuning stage, illustrates the issue with concrete examples, and presents mitigation techniques such as knowledge‑probing data generation and web‑search tool integration using special tokens.
Introduction
Large language models (LLMs) often produce seemingly plausible but factually incorrect or fabricated responses, a phenomenon known as hallucination. The article draws inspiration from Andrej Karpathy’s video on LLM cognition and sets out to explain why hallucinations occur and how they can be reduced.
Video link: https://www.bilibili.com/video/BV1kPNpeTEQ5
LLM Training Pipeline
Pretraining : The model consumes massive, high‑quality internet text, learning general language patterns, grammar, and factual knowledge. The output of this stage is the “base model”, a token predictor.
Supervised Fine‑Tuning (SFT) : A dialogue dataset (e.g., OpenAssistant/oasst1) containing hundreds of thousands of multi‑turn conversations is used to turn the base model into an assistant that can answer questions.
Reinforcement Learning with Human Feedback (RLHF) : Human annotators rank model outputs; a separate reward model is trained to mimic these rankings, guiding the assistant toward human‑preferred behavior.
Dataset link: https://huggingface.co/datasets/OpenAssistant/oasst1
Why Hallucinations Occur
Hallucinations primarily arise during the SFT stage. The model does not “know” facts; it predicts the most statistically likely token sequence based on its training data. Consequently, when asked about an unseen entity—e.g., the fabricated name "Zyler Vance"—the model confidently generates a false answer because the training data contains many similar question‑answer patterns that always return a confident response.
Probing Model Knowledge
Meta’s 2024 paper "The Llama 3 Herd of Models" describes a knowledge‑probing technique to identify what the model knows and does not know. The workflow consists of:
Data extraction: pull text fragments from the pretraining corpus.
Question generation: prompt Llama 3 to create factual questions based on each fragment.
Answer sampling: collect multiple model responses for each question.
Accuracy evaluation: compare answers to the original fragment using the model itself as a judge.
Information‑density evaluation: assess how much useful information each answer contains.
Refusal generation: for consistently high‑information but incorrect answers, generate a refusal response (see paper p.27).
Using the generated data, the model is encouraged to answer only when it is confident and to refuse or say it does not know otherwise. Experiments show a measurable reduction in hallucination rates for Llama 3.
Mitigation Strategies
Knowledge‑probing data augmentation : Insert examples where the correct answer is “I don’t know” to teach the model to recognize its knowledge limits.
Web‑search tool integration : Introduce special tokens <SEARCH_START> and <SEARCH_END>. When the model emits <SEARCH_START>, generation pauses, a search request is sent to an engine, the retrieved text is inserted into the context window, and the model resumes generation using this fresh information.
This mechanism mirrors human behavior: when a fact is forgotten, we look it up. By providing thousands of dialogue examples that demonstrate when and how to invoke the search tool, the model learns to refresh its “memory” with up‑to‑date facts.
Conclusion
Hallucination is an inherent by‑product of the LLM training pipeline, especially the supervised fine‑tuning stage, because models aim to produce statistically plausible text rather than verified facts. While early models suffered heavily, strategies such as knowledge probing and web‑search tool integration have demonstrably mitigated the issue. Nevertheless, completely eliminating hallucinations remains an open research challenge as LLMs continue to evolve.
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
