Artificial Intelligence 22 min read

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

NewBeeNLP

Nov 7, 2024

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

1. What Is a Large Model Hallucination?

Hallucination refers to the phenomenon where a large model generates fluent, low‑perplexity text that is factually incorrect or unverifiable, often described as "serious nonsense." Two definitions are commonly used: (1) content that contradicts human knowledge, and (2) content that cannot be falsified.

2. Types of Hallucinations

Factual hallucinations : include factual inconsistency and fabricated facts.

Faithful hallucinations : include failure to follow instructions and failure to follow context.

A practical classification flowchart starts by checking instruction compliance, then context compliance, and finally categorizes errors into knowledge‑fabrication, calculation errors, or logical inconsistencies.

3. Root Causes

Hallucinations stem from three major sources:

Data : missing, outdated, or out‑of‑scope knowledge.

Algorithm & Training : reliance on statistical shortcuts, decoder‑only architecture limitations, exposure bias between training (MLE) and inference (autoregressive generation), and misaligned fine‑tuning.

Inference : stochastic decoding (temperature, top‑k, top‑p), long‑context attention dilution, and softmax bottlenecks.

4. Detection Methods

Detection can be organized by knowledge certainty:

Identify non‑deterministic questions (subjective, philosophical, speculative) and treat them separately.

For deterministic questions, classify model knowledge states: knows‑knows, knows‑doesn’t‑know, doesn’t‑know‑knows, doesn’t‑know‑doesn’t‑know . The last two indicate potential hallucination.

Additional techniques include:

Self‑consistency: generate multiple answers, cluster semantically; high variance suggests hallucination.

Cross‑answer verification: check for contradictions among multiple responses.

External tools: use search engines or code interpreters to fetch evidence and compare with model output.

5. Mitigation Strategies

Mitigation spans the entire pipeline—from data to inference:

5.1 Data‑Level Solutions

Enrich training data with up‑to‑date knowledge, apply knowledge‑editing, and use data augmentation (e.g., Self‑QA to create <question, answer> pairs for FAQ indexing).

5.2 Retrieval‑Augmented Generation (RAG)

RAG workflow includes query preprocessing, semantic routing, indexing (structured, semi‑structured, unstructured), retrieval, re‑ranking, and context selection before generation. Sub‑query generation (few‑shot or SFT) helps break complex tasks into manageable steps.

5.3 File Parsing & Data Augmentation

Parse PDFs, Word docs, and FAQs; extract tables, formulas, and images (using OCR). Convert images to captions via models like LLaVA for indexing. Apply Self‑QA to generate additional QA pairs, then augment with paraphrasing or summarization.

5.4 Context Selection

Filter retrieved documents with a lightweight model; optionally present candidates as multiple‑choice questions for the large model to select relevant snippets. Techniques like small‑to‑big routing and extended context windows further improve relevance.

5.5 Decoding Strategies

Dynamic decoding parameters : adjust temperature, top‑k, etc., based on task.

Contrastive decoding : compare distributions of a large model and a smaller reference model to prune unlikely tokens.

Recitation‑augmented generation : prompt the model to recall relevant knowledge before answering.

Gen‑Critic‑Edit : let the model critique its own output, optionally using external evidence for correction.

5.6 Model‑Enhancement Techniques

Pre‑training : inject missing knowledge from external corpora (e.g., Common Crawl) and continuously update with time‑sensitive data.

Fine‑tuning & Alignment : use SFT and DPO to align model behavior; ensure training data includes both positive and negative examples derived from the model itself.

RARR : rewrite queries, retrieve evidence, and let the model revise answers based on conflicts.

FAVA : train a rewrite model with noisy augmentations to teach the model correction capabilities.

6. Real‑World Applications by 360

360 applied the above solutions to content safety detection, achieving top rankings in the China Academy of Information and Communications Technology AI Safety Benchmark. The same pipeline powers 360AI Search and 360AI Browser, demonstrating practical reductions in hallucination for user‑facing products.

7. Future Exploration

Continuous benchmarking in real scenarios reveals gaps between synthetic tests and production data. Ongoing work focuses on building domain‑specific benchmarks, iterating on retrieval and reasoning components, and exploring new alignment techniques to further curb hallucinations.

8. Q&A Highlights

Q1: How is the binary classifier in the RAG workflow trained?

A1: A hallucination taxonomy is defined, data is harvested from online logs, auto‑annotation tools assist offline mining, and human annotators label the data. The resulting dataset trains a classification model.

Q2: What methods are used for re‑ranking mixed retrieval results?

A2: Keyword search, semantic search (e.g., bge‑rerank), relational DB search, and graph search each have dedicated re‑rankers. A final ensemble or learned re‑rank model combines the streams based on business needs.

prompt engineering Large Language Models RAG model evaluation AI safety hallucination

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.