Artificial Intelligence 16 min read

How Task‑Aware Decoding and RAG Reduce Hallucinations in Large Language Models

This article reviews the hallucination problem in large language models, analyzes its data, training, and inference sources, and presents Task‑aware Decoding (TaD) and Retrieval‑Augmented Generation (RAG) as effective, plug‑and‑play solutions demonstrated through extensive experiments.

JD Cloud Developers

Jul 16, 2024

How Task‑Aware Decoding and RAG Reduce Hallucinations in Large Language Models

1. Background

Large language models (LLMs) such as ChatGPT have sparked a new AI wave, offering human‑like dialogue, reasoning, and planning capabilities. However, their tendency to generate inaccurate or misleading information—known as hallucination—poses serious risks in high‑stakes domains like medicine, law, and industrial automation.

This paper explores solutions to LLM hallucination.

2. Related Research

LLMs are fundamentally language models that predict token probabilities from massive corpora. Because they do not truly understand facts, hallucinations are inevitable. Prior work identifies three main sources of hallucination: data, training, and inference, and proposes mitigation strategies.

2.1 Data‑Induced Hallucination

Low‑quality, incomplete, or outdated training data can cause hallucinations. Simple data cleaning and expanding high‑quality factual corpora help but cannot fully eliminate the problem due to inherent knowledge boundaries.

Two mainstream approaches address knowledge boundaries: knowledge editing (modifying model parameters) and Retrieval‑Augmented Generation (RAG), which injects external knowledge without changing the model.

2.2 Training‑Induced Hallucination

LLM training suffers from one‑directional modeling, attention deficiencies, exposure bias, and alignment issues (e.g., SFT, RLHF). Optimizing architecture, attention mechanisms, or training objectives can alleviate hallucinations, yet these methods often lack generality and practical applicability.

2.3 Inference‑Induced Hallucination

Decoding strategies such as high‑temperature sampling increase hallucination risk, while attention shortcomings further degrade factuality. Layer‑Contrast Decoding (DoLa) mitigates this by emphasizing higher‑layer factual knowledge over lower‑layer linguistic patterns, though it may introduce grammatical errors and repetition.

3. Technical Breakthroughs

Combining RAG with a novel Task‑aware Decoding (TaD) method, developed by JD.com and Tsinghua University and published at IJCAI 2024, offers a plug‑and‑play solution that reduces LLM hallucination across various models, fine‑tuning techniques, tasks, and data regimes.

Task‑aware Decoding (TaD) leverages the probability distribution shift between a pre‑fine‑tuned LLM and its fine‑tuned counterpart to construct a knowledge vector that steers generation toward task‑specific, factual outputs.

The principle is illustrated in Figure 3: after fine‑tuning, the model assigns higher probability to task‑relevant tokens (e.g., “catalyze”) while reducing probability for generic tokens (e.g., “engage”).

Knowledge Vector captures this distribution change, enhancing the model’s ability to incorporate downstream domain knowledge, especially when training data are scarce.

4. Deployment Cases

In JD’s generic knowledge‑question answering system, TaD is combined with RAG to inject proprietary factual knowledge, dramatically lowering hallucination rates across more than 6,000 business scenarios.

5. Reflections and Outlook

Future work should explore more integrated system architectures (RAG + agents), deeper fusion of external knowledge with LLM reasoning, and continued development of low‑hallucination LLM techniques like TaD.

6. Conclusion

Mitigating LLM hallucination requires a multi‑level approach; while no single method solves the problem completely, TaD provides a practical, model‑agnostic way to improve factuality, especially under limited data conditions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Retrieval-Augmented Generation knowledge injection Hallucination Task-aware Decoding DoLa

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.