AI Pollution: How Generated Content Threatens the Internet and Model Training
The article examines how AI-generated misinformation spreads across platforms—from misleading answers on Bing and Stack Overflow to fabricated news stories—highlighting the resulting contamination of online information, the risks to model training, and emerging efforts to detect and curb such low‑quality AI output.
Recent observations show that AI systems, such as Bing, can provide seemingly reliable answers that are actually unverified, exemplified by a user "百变人生" who rapidly answered questions with inaccurate information, leading to the spread of false details on Chinese platforms.
"AI Pollution Sources" Are Not Limited to One Platform
Beyond Bing, similar AI‑generated misinformation appears on Zhihu, where users receive answers tagged as "AI‑assisted creation," and on Reddit and other forums where ChatGPT bots answer questions without guaranteeing accuracy.
Instances of fabricated news, such as a sensational story about a chicken‑shop murder in Zhengzhou and a false report of a train accident in Gansu, were traced back to individuals using AI tools like ChatGPT to generate click‑bait content for profit, leading to criminal actions by authorities.
Internationally, the problem extends to Stack Overflow, which temporarily disabled AI‑generated answers because the high error rate of ChatGPT responses overwhelmed the community’s capacity to verify them.
Abuse of AI Also Undermines AI
Researchers from Cambridge and Edinburgh published an arXiv paper titled "The Curse of Recursion: Training on Generated Data Makes Models Forget," demonstrating that training new models on AI‑generated data degrades their performance and creates irreversible defects.
The paper warns that such data pollution will distort models' perception of reality, making future training on internet data increasingly difficult.
Experts like Daphne Ippolito (Google Brain) note that finding high‑quality, untainted data for future AI training will become ever harder.
Despite the bleak outlook, some platforms are beginning to address the issue by developing detection technologies and implementing policies to limit low‑quality AI‑generated content.
Overall, the unchecked proliferation of AI‑generated misinformation threatens both the integrity of online information ecosystems and the reliability of future AI models.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.