Artificial Intelligence 18 min read

Task‑Aware Decoding (TaD): A Plug‑and‑Play Method to Mitigate Hallucinations in Large Language Models

TaD, a task‑aware decoding technique jointly developed by JD.com and Tsinghua University and presented at IJCAI 2024, leverages differences between pre‑ and post‑fine‑tuned LLM outputs to construct knowledge vectors, significantly reducing hallucinations across various models, tasks, and data‑scarce scenarios, especially when combined with RAG.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Task‑Aware Decoding (TaD): A Plug‑and‑Play Method to Mitigate Hallucinations in Large Language Models

TaD: Task‑aware Decoding (TaD) is a technique proposed by JD.com and Tsinghua University to address hallucinations in large language models (LLMs). The work was accepted at IJCAI 2024.

RAG: Retrieval‑augmented Generation (RAG) is a widely used systematic solution for mitigating LLM hallucinations.

1. Background

Recent generative LLMs such as ChatGPT have sparked a new AI wave, achieving near‑human level dialogue quality but also exhibiting hallucinations—incorrect or misleading outputs—that hinder reliable deployment in high‑stakes domains.

This article explores solutions to the LLM hallucination problem.

2. Related Research

LLMs are fundamentally language models that predict token probabilities. Their hallucinations stem from data, training, and inference stages, and cannot be completely eliminated.

2.1 Data‑induced Hallucination

Low‑quality, incomplete, or outdated training data cause hallucinations. Strategies include data cleaning, knowledge editing, and RAG.

Data Cleaning – Collect more high‑quality factual data and clean existing corpora.

Knowledge editing can be parameter‑based or external‑module‑based, but both have limitations; thus, RAG is often preferred.

Knowledge Editing – Adjust model parameters or add external modules to bridge knowledge gaps, though risks remain.

RAG – Introduces an external retrieval step, feeding retrieved documents together with the prompt to the LLM, thereby reducing hallucinations.

2.2 Training‑induced Hallucination

LLM training suffers from single‑directional representation limits, attention defects, and exposure bias. Fine‑tuning methods (SFT, RLHF) can also introduce hallucinations when label data exceed the model’s knowledge.

2.3 Inference‑induced Hallucination

Decoding randomness (e.g., high temperature) and attention deficiencies increase hallucination risk.

DoLa (Decoding by Contrasting Layers) – Computes logits differences between upper and lower transformer layers to amplify factual knowledge and suppress hallucination.

3. Technical Breakthrough

RAG is effective but still relies on LLM output quality. JD Retail and Tsinghua University propose Task‑aware Decoding (TaD) , a plug‑and‑play method that compares pre‑ and post‑fine‑tuned LLM output distributions to build a knowledge vector, improving factuality without altering the base model.

TaD Principle – After fine‑tuning, the probability of task‑relevant tokens (e.g., “catalyze”) increases while generic tokens (e.g., “engage”) decrease, indicating better alignment with downstream tasks.

Knowledge Vector – Represents the shift in conditional probability distributions between pre‑ and post‑fine‑tuned models, capturing the adaptation from general to task‑specific knowledge.

Experimental Results

TaD was evaluated on multiple LLMs using LoRA, Adapter, etc., across various tasks. Tables 1‑4 (shown as images) demonstrate consistent performance gains, especially when training data are scarce.

4. Deployment Cases

Combining RAG with TaD yields a low‑hallucination LLM for JD’s knowledge‑question‑answer system, now serving over 6,000 business scenarios.

5. Outlook

Future work may involve more complex systems (RAG + Agent + tools), deeper integration of external knowledge with LLM reasoning, and continued research on low‑hallucination LLMs such as TaD.

6. Conclusion

Mitigating LLM hallucination requires a multi‑level approach; while no single solution is definitive, techniques like TaD and RAG together substantially improve reliability and enable broader AI adoption.

References

[1] Hallucination is Inevitable: An Innate Limitation of Large Language Models

[2] A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

[3] Unveiling the Causes of LLM Hallucination and Overcoming LLM Hallucination

[4] Editing Large Language Models: Problems, Methods, and Opportunities

[5] ACL 2023 Tutorial: Retrieval‑based Language Models and Applications

[6] Theoretical Limitations of Self‑Attention in Neural Sequence Models

[7] Sequence level training with recurrent neural networks.

[8] Discovering language model behaviors with model‑written evaluations

[9] Dola: Decoding by contrasting layers improves factuality in large language models

[10] BERT rediscovers the classical NLP pipeline

[11] Retrieval‑Augmented Generation for Large Language Models: A Survey

[12] TaD: A Plug‑and‑Play Task‑Aware Decoding Method to Better Adapt LLM on Downstream Tasks

[13] Inference‑time intervention: Eliciting truthful answers from a language model

[14] Beyond RAG: Building Advanced Context‑Augmented LLM Applications

AILLMRAGhallucinationTask-aware Decodingknowledge vector
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.