Data Party THU
Dec 6, 2025 · Artificial Intelligence
Why Adding Toxic Data Can Make Language Models Safer and More Capable
A recent study shows that deliberately mixing a moderate amount of toxic content into large‑language‑model pre‑training actually sharpens the model’s internal representation of toxicity, enabling post‑training interventions to more effectively detoxify the model while preserving or even improving its general capabilities.
LLMModel AlignmentToxic Data
0 likes · 10 min read
