Artificial Intelligence 12 min read

Can Adversarial Training Make Retrieval‑Augmented Generators More Robust?

Recent arXiv work introduces ATM, an adversarially‑tuned multi‑agent system that iteratively pits a fake‑knowledge attacker against a generator, dramatically improving retrieval‑augmented language models’ resistance to hallucinated content and boosting performance on knowledge‑intensive benchmarks, even with noisy or irrelevant documents.

Baobao Algorithm Notes

Jun 3, 2024

Can Adversarial Training Make Retrieval‑Augmented Generators More Robust?

Background

Large language models (LLMs) generate fluent text but often fail on knowledge‑dense tasks that require accurate factual answers. Retrieval‑augmented generation (RAG) mitigates this by supplying external documents to the generator, yet the generator can still be misled by hallucinated or irrelevant passages, leading to incorrect responses.

Why BERT Is Unsuitable for Generation

BERT (Bidirectional Encoder Representations from Transformers) is an encoder‑only model pre‑trained with Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). Its architecture and objectives are designed for understanding rather than autoregressive generation, which creates several practical limitations:

Decoder directionality : BERT lacks a true left‑to‑right decoder; any decoder added on top must generate token‑by‑token without future context.

Pre‑training objectives : MLM and NSP do not directly optimize for coherent, fluent text generation.

Generation efficiency : Token‑wise generation is slow, especially for long passages.

Generation quality : Without generation‑focused fine‑tuning, outputs tend to be less diverse and less fluent than models such as GPT.

Context length : BERT processes a fixed‑size input window, limiting its ability to incorporate long‑range information.

Optimization cost : Adapting BERT for generation requires substantial compute for fine‑tuning.

BERT为什么不适合生成任务？

BERT（Bidirectional Encoder Representations from Transformers）是一种预训练语言表示模型，它在自然语言处理（NLP）领域取得了巨大的成功，特别是在理解语言的任务上，如文本分类、问答、命名实体识别等。然而，BERT在生成任务上的表现并不理想，原因主要包括以下几点：
1. 单向解码限制：BERT的编码器是双向的，但解码器通常是单向的，只能从左到右生成文本，无法利用未来信息。
2. 预训练目标：MLM 和 NSP 并不直接针对生成任务进行优化。
3. 生成效率：逐词生成导致效率低下，尤其是长文本。
4. 生成质量：缺乏针对生成的专门优化，导致连贯性和多样性不足。
5. 上下文限制：只能处理固定长度的上下文，难以应对需要更长上下文的任务。
6. 优化困难：需要大量计算资源进行微调。
虽然可以通过微调或与其他模型结合在特定场景下使用，但总体上 BERT 不是为生成任务设计的。

Limitations of Current RAG Systems

In typical RAG pipelines the retriever and generator are optimized separately. A weak retriever may return noisy or hallucinated documents, and the generator often trusts these documents too much, causing fragile performance when the retrieved set contains irrelevant or fabricated content.

Existing Robustness Method (RetRobust)

RetRobust improves robustness by injecting unrelated (noise) documents into the training data, encouraging the generator to ignore completely irrelevant context. This helps when the distractors are unrelated, but it struggles when hallucinated documents are topically similar to the query, because the model cannot easily distinguish useful from harmful information.

Adversarial Training Framework (ATM)

The paper “ATM: Adversarial Tuning Multi‑agent System Makes a Robust Retrieval‑Augmented Generator” proposes a two‑player co‑evolutionary framework:

Attacker : Generates deliberately misleading (hallucinated) passages and inserts them into the retrieved set.

Generator : Trained to produce correct answers even when the retrieved set contains such adversarial content.

The Attacker is optimized with Direct Preference Optimization (DPO) to maximize the Generator’s error, while the Generator is trained with a dual objective: standard supervised fine‑tuning (SFT) loss on correct answers plus a token‑level consistency loss that forces identical outputs for clean and adversarial inputs.

Training Loop

for iteration in range(num_iterations):
    # 1. Attacker step – sample adversarial passages that maximize generator error
    adv_docs = attacker.sample(maximize=generator.error)

    # 2. Generator step – mix genuine and adversarial documents
    docs = concatenate(genuine_docs, adv_docs)
    pred = generator(docs)

    #   a) Standard SFT loss (cross‑entropy) on the target answer
    loss_sft = cross_entropy(pred, target_answer)

    #   b) Token‑level consistency loss between clean and adversarial inputs
    pred_clean = generator(genuine_docs)
    pred_adv   = generator(adv_docs)
    loss_consistency = token_l2(pred_clean, pred_adv)

    #   c) Total loss (λ balances the two terms)
    loss = loss_sft + lambda_consistency * loss_consistency
    optimizer.step(loss)

Experimental Results

Evaluation on several knowledge‑intensive benchmarks shows that a 7 B‑parameter ATM model consistently outperforms multiple 13 B baselines. Key observations:

Robustness improves as the proportion of hallucinated documents increases; accuracy degrades far less than in baseline RAG systems.

The approach is model‑agnostic: it remains effective when the source retriever or the hallucination‑prone model changes.

Without additional fine‑tuning, the 7 B ATM model achieves strong performance on PopQA simply by retrieving relevant passages.

Adversarial training also makes the generator tolerant to variations in the ordering of retrieved documents.

Conclusion

Retrieval‑augmented generation is a powerful way to inject external knowledge into LLMs, but it is vulnerable to hallucinated content from other models. The adversarial training framework introduced in ATM demonstrates that iterative co‑training of a malicious Attacker and a robust Generator can substantially improve resistance to misleading retrieved documents and raise real‑world question‑answering accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models RAG retrieval augmentation adversarial training hallucination mitigation

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.