Do LLMs Silence Human Voices? Unveiling the ‘Spiral of Silence’ in Retrieval‑Augmented Generation

This article reviews the ACL 2024 paper that investigates how large language model‑generated text influences retrieval‑augmented generation pipelines, revealing short‑term retrieval gains but a long‑term “spiral of silence” that marginalizes human‑generated content and homogenizes open‑domain QA results.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
Do LLMs Silence Human Voices? Unveiling the ‘Spiral of Silence’ in Retrieval‑Augmented Generation

Research Background

Problem: Investigate how LLM‑generated text influences Retrieval‑Augmented Generation (RAG) pipelines, especially whether synthetic text gradually replaces human‑written documents, creating a “spiral of silence” in open‑domain QA.

Challenges: rapid diffusion of LLM output, impact on indexing and retrieval, measuring short‑ and long‑term effects, and preventing misinformation amplification.

Related work: prior studies on RAG, AI‑generated content (AIGC), and the spiral‑of‑silence theory.

Methodology

The authors construct an iterative simulation that starts from a purely human‑generated corpus and progressively injects LLM‑generated passages.

RAG formalization

RAG is modeled as a function RAG(Q, D, K, G) = Gen(Ret(Q, D ∪ G)), where: Q: set of queries. D: original document collection. K: LLM knowledge base (used only in generation). G: set of LLM‑generated texts that are added to the index. Ret: retrieval function (e.g., BM25, Contriever). Gen: generation function (LLM).

Iterative simulation pipeline

Establish baseline: run the RAG pipeline on the original human corpus and record retrieval (Acc@5, Acc@20) and generation (Exact Match) scores.

Introduce zero‑sample LLM text: generate synthetic passages for a subset of queries without any example prompts and add them to the corpus, forming a new document set D′ = D ∪ G.

Retrieval & re‑ranking: for each query retrieve a candidate set (e.g., top‑k) using a retrieval model, then apply a re‑ranker (MonoT5‑3B, UPR‑3B, BGEreranker).

Generation: use an LLM (GPT‑3.5‑Turbo, LLaMA2‑13B‑Chat, Qwen‑14B‑Chat, Baichuan2‑13B‑Chat, ChatGLM3‑6B) to produce answer texts.

Post‑processing: strip any token that could reveal the synthetic origin.

Index update: merge the newly generated texts into the document collection and rebuild the index.

Repeat steps 2‑6 for a predefined number of iterations (e.g., 10), each time increasing the proportion of synthetic documents.

Experimental Design

Datasets: Open‑domain QA benchmarks NQ, WebQ, TriviaQA, PopQA.

Metrics: Retrieval accuracy (Acc@5, Acc@20); generation quality (Exact Match, EM).

Retrieval models: Sparse BM25, dense Contriever, BGEBase, LLMEmbedder.

Re‑rankers: MonoT5‑3B, UPR‑3B, BGEreranker.

LLMs for generation: GPT‑3.5‑Turbo, LLaMA2‑13B‑Chat, Qwen‑14B‑Chat, Baichuan2‑13B‑Chat, ChatGLM3‑6B.

Results

Short‑term impact

Introducing synthetic text immediately improves retrieval metrics; for example, BM25 on TriviaQA shows Acc@5 ↑ 31.2 % and Acc@20 ↑ 19.1 %.

QA generation quality (EM) varies: some models benefit, others experience slight degradation.

Long‑term impact

As the proportion of LLM‑generated documents grows across iterations, retrieval effectiveness declines (e.g., on NQ, Acc@5 drops on average 21.4 % from iteration 1 to 10).

Generation quality (EM) remains relatively stable, fluctuating within a narrow range.

Spiral‑of‑silence phenomenon

Retrieval models increasingly rank LLM‑generated passages higher, pushing human‑written documents down the result list.

After ten iterations, human‑authored texts constitute less than 10 % of top‑k results across all datasets.

This homogenization reduces both diversity and overall accuracy of retrieved information.

Conclusion

LLM‑generated text can boost short‑term retrieval performance, but sustained injection leads to a “spiral of silence” where synthetic content dominates the index, marginalizing human‑authored information and causing long‑term homogenization. Preserving diversity in digital information ecosystems requires careful monitoring of synthetic content growth in RAG systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMinformation retrievalRetrieval Augmented GenerationAI ImpactOpen Domain QASpiral of Silence
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.