JD Retail Technology
JD Retail Technology
Dec 4, 2025 · Artificial Intelligence

Twin Networks Reveal How to Optimize Data Mixtures for Large Language Models

This article presents TANDEM, a bi‑level data‑mixture optimization framework that uses twin networks to automatically adjust domain‑specific training data ratios, offering theoretical guarantees, broader applicability, and significant performance gains across pre‑training, fine‑tuning, and e‑commerce product‑understanding tasks.

NeurIPSbi-level optimizationdata mixture optimization
0 likes · 6 min read
Twin Networks Reveal How to Optimize Data Mixtures for Large Language Models