Dynamic Margin Selection for Efficient Deep Learning and Low-Resource Large Model Training
Jia Xing’s research introduces Dynamic Margin Selection, a technique that repeatedly refreshes a core set of boundary‑close samples to train large language models efficiently on limited resources, achieving comparable loss to full‑data training, enabling six‑fold model compression, faster inference, and a proposed exponential scaling law for data‑efficient AI.
Jia Xing, a 2021 JD PhD management trainee, joined JD Retail Technology after graduating from the Institute of Automation, Chinese Academy of Sciences. He tackled the challenging problem of training large models under low-resource conditions, publishing four top‑conference papers, filing ten patents, and being selected as an outstanding talent.
His early work focused on building a same‑product identifier to automate pre‑review filtering of massive product listings. By adapting Llama‑7b into a binary MultiChoice model that outputs only “yes” or “no”, the system could automatically filter easy cases, sending low‑confidence samples for manual review, achieving over 50 % reduction in manual effort.
To address the heavy computational cost of large language models, the team distilled the model, achieving more than six‑fold compression and roughly four‑fold inference speedup with negligible loss in accuracy. The solution was deployed across multiple business scenarios, saving substantial annotation costs.
Inspired by support‑vector‑machine margin concepts, the research introduced Margin Selection (MS) , which retains training samples close to the decision boundary and discards those far away. Recognizing that the decision boundary shifts during training, a Dynamic Margin Selection (DynaMS) strategy was devised: every N epochs the core sample set is refreshed, and training continues on this updated set. The method is illustrated in the accompanying diagram.
Convergence analysis shows that models trained with DynaMS converge to the same loss as training on the full dataset. The approach was accepted at the top‑tier conference ICLR.
The work also explores the power‑scaling law of large models, where validation loss decreases only marginally with massive increases in training data. By selecting the most informative samples, the authors hypothesize a new exponential scaling law that could break the traditional power‑law limitation.
Overall, the story emphasizes that business problems can drive meaningful AI research, and that focusing on core, information‑dense samples enables efficient training of large models under resource constraints.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.