Artificial Intelligence 15 min read

Taming Hallucinations and Multi‑Turn Failures in RAG Systems

This article breaks down the final‑mile challenges of Retrieval‑Augmented Generation—hallucinations, broken multi‑turn dialogue, prompt design, citation, and feedback loops—and provides concrete, layered solutions ranging from hard‑coded prompts and few‑shot examples to query rewriting, history management, post‑processing filters, and self‑check mechanisms.

Wu Shixiong's Large Model Academy

Mar 11, 2026

Taming Hallucinations and Multi‑Turn Failures in RAG Systems

1. Hallucination Problem: Correct Retrieval, Wrong Answer

Even when the retrieved document fragments are accurate, large language models may generate statements that never appear in the source because they continue writing based on pre‑trained knowledge, leading to fabricated content (hallucinations). In high‑risk domains such as insurance, this can cause serious compliance issues.

How to suppress hallucinations?

Layer 1 – Prompt‑level hard constraints. Instruct the model to answer only using the provided material and to explicitly state when the information is insufficient.

请根据以下提供的资料回答用户问题。如资料不足，请坦诚说明。
资料:
[1] {文档1标题} {文本片段...}
[2] {文档2标题} {文本片段...}
问题: {用户问题}
回答:

Numbered fragments ([1], [2]) both guide the model and make source attribution easy.

Layer 2 – Few‑shot example guidance. Include one or two example Q&A pairs that demonstrate answering from the supplied material and citing the source numbers.

Layer 3 – Post‑processing filter. After generation, run a rule‑based or classifier filter to detect statements not supported by the retrieved documents; replace or flag them, and optionally trigger a second retrieval for verification.

2. Multi‑Turn Dialogue: Drift from the Second Turn

In a multi‑turn scenario, the model often loses the reference of pronouns like "this". For example, after answering "ABC寿险的保障范围是…" the follow‑up question "这个怎么申请" lacks context, causing the retrieval module to issue an unrelated query.

How to solve multi‑turn breakage?

Detect follow‑up questions. Identify pronouns or omitted subjects and flag the turn as a continuation.

Query rewriting. Combine the current question with the previous topic, e.g., rewrite "这个怎么申请" to "ABC寿险怎么申请" before sending it to the retriever.

Maintain dialogue history. Store each turn’s question, answer, and key topics; for long conversations, summarize recent turns or keep only the last 3‑5 rounds to avoid exceeding the prompt length.

3. Prompt Construction: More Than Simple Concatenation

Context quantity and order

Only the top‑ranked 5‑10 fragments should be included, placed in order of relevance, and each fragment should be clearly labeled with a title or number to help the model differentiate sources.

Explicit answer format constraints

Specify the desired style in the prompt, such as "summarize in three sentences" or "use professional terminology for financial practitioners", to steer the model away from its default style.

Handling conflicting information

If retrieved fragments contain contradictory statements, guide the model to prioritize the earlier‑ranked source or filter out low‑confidence fragments during retrieval.

4. Citation: Making Answers Traceable

In regulated domains, users need to know where each answer originates.

Implementation steps

Retrieve fragments with metadata. Store document name, page number, slide number, or video timestamp alongside each snippet.

Number fragments in the prompt. Pass them as [1], [2], … and ask the model to cite these numbers when referencing information.

Post‑process mapping. Replace the model’s numeric citations with human‑readable references like "【Source: ABC保险手册第10页】" for display.

This citation requirement also acts as a deterrent to hallucination because the model cannot fabricate a citation it does not have.

5. Feedback and Self‑Check: Adding a Safety Net

After the initial answer, add a second prompt that asks the model to verify whether the response fully relies on the supplied material and whether any unsupported claims remain.

If unsupported parts are found, the model can either amend the answer or append a disclaimer such as "此部分信息未在提供的资料中找到".

For critical applications, the self‑check can trigger a new retrieval (feedback‑style retrieval) and a second generation pass, forming a closed loop that improves accuracy at the cost of an extra LLM call.

Interview Guidance

When asked about hallucinations despite correct retrieval, structure the answer as follows:

Define the problem (LLM hallucination).

Present the three‑layer defense: prompt hard constraints & few‑shot examples, post‑processing filter, citation mechanism.

Explain multi‑turn handling via query rewriting and history maintenance.

Describe prompt engineering details (context ordering, answer format, conflict resolution).

Conclude with self‑check and feedback‑style retrieval for high‑risk scenarios.

Conclusion

The RAG optimization is a systematic engineering effort: query understanding, knowledge base quality, online retrieval coverage, and generation fidelity all must work together. Retrieval quality is necessary but not sufficient; hallucinations, multi‑turn drift, and missing citations require dedicated strategies outlined above.

prompt engineering RAG Hallucination mitigation multi‑turn dialogue citation feedback retrieval self‑check

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.