Tagged articles

Training Data Attribution

1 articles · Page 1 of 1
Machine Heart
Machine Heart
Jun 28, 2026 · Artificial Intelligence

Which Training Data Shapes Large‑Model Abilities? Introducing Mechanistic Data Attribution (MDA)

The paper presents Mechanistic Data Attribution, a framework that traces the origins of specific internal mechanisms such as induction heads to particular training samples, revealing that repetitive "garbage" data—not high‑quality text—drives their emergence, and validates this causal link through deletion and augmentation experiments while enabling scalable data‑driven model improvement.

Causal InterventionData AugmentationInduction Heads
0 likes · 12 min read
Which Training Data Shapes Large‑Model Abilities? Introducing Mechanistic Data Attribution (MDA)