Data Party THU
Data Party THU
Dec 6, 2025 · Artificial Intelligence

Why Adding Toxic Data Can Make Language Models Safer and More Capable

A recent study shows that deliberately mixing a moderate amount of toxic content into large‑language‑model pre‑training actually sharpens the model’s internal representation of toxicity, enabling post‑training interventions to more effectively detoxify the model while preserving or even improving its general capabilities.

LLMModel AlignmentToxic Data
0 likes · 10 min read
Why Adding Toxic Data Can Make Language Models Safer and More Capable