Tagged articles
11 articles
Page 1 of 1
Machine Heart
Machine Heart
May 18, 2026 · Artificial Intelligence

ICML 2026: From Single‑Threaded Thinking to Native Parallel Reasoning in Agents

The paper introduces Native Parallel Reasoner (NPR), a framework that lets language agents generate and maintain multiple reasoning paths using a three‑stage self‑distillation and parallel reinforcement‑learning training paradigm, achieving up to 4.6× speedup and significant accuracy gains across eight reasoning benchmarks.

AI reasoningLarge Language ModelsNative Parallel Reasoner
0 likes · 18 min read
ICML 2026: From Single‑Threaded Thinking to Native Parallel Reasoning in Agents
Machine Heart
Machine Heart
May 13, 2026 · Artificial Intelligence

Zero‑Cost Upgrade: OneSearch‑V2 Launches Generative Search, Boosting Buyers and Orders

OneSearch‑V2 introduces a zero‑cost generative search upgrade that leverages latent‑reasoning‑enhanced self‑distillation, thought‑augmented query understanding, and behavior‑feedback preference alignment, delivering offline HitRate gains of up to 2.68 % and online CTR, buyer and order increases of roughly 4 %, 2 % and 2 % respectively.

AI RankingBehavioral FeedbackGenerative Search
0 likes · 24 min read
Zero‑Cost Upgrade: OneSearch‑V2 Launches Generative Search, Boosting Buyers and Orders
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 29, 2026 · Artificial Intelligence

Dual Engine for Training and Inference: How Princeton’s SD‑ZERO and AggAgent Redefine Complex Reasoning

The article reviews two recent Princeton papers—SD‑ZERO, which introduces self‑revision training and on‑policy self‑distillation to turn a model’s own error traces into dense supervision, and AggAgent, which actively aggregates parallel long‑horizon trajectories—showing how internal trajectory mining can cut compute costs and boost accuracy on challenging math and code benchmarks.

AggAgentLarge Language ModelsOn-Policy Distillation
0 likes · 10 min read
Dual Engine for Training and Inference: How Princeton’s SD‑ZERO and AggAgent Redefine Complex Reasoning
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 22, 2026 · Artificial Intelligence

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

The article explains On-Policy Distillation, derives its forward and reverse KL gradients, introduces Self‑Distillation where the policy serves as its own teacher, discusses practical implementation tricks such as extra‑knowledge injection, EMA or trust‑region teacher stabilization, and highlights benefits like reduced catastrophic forgetting, fewer Aha moments, and a narrower train‑test gap, especially for larger models.

Catastrophic ForgettingEMAKL divergence
0 likes · 6 min read
What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation
HyperAI Super Neural
HyperAI Super Neural
Feb 6, 2026 · Artificial Intelligence

Latest Advances in AI Agents: PaperBanana, SDPO, Lumine, Idea2Story, and Insight Agents

This weekly roundup highlights five recent AI agent papers—PaperBanana for automated academic illustration, SDPO's self‑distillation reinforcement learning, Lumine's open‑world generalist agent, Idea2Story's pipeline for turning research ideas into narratives, and Insight Agents' fast e‑commerce insights—showcasing diverse breakthroughs in multi‑agent frameworks, self‑feedback learning, and real‑world deployment.

AI agentsReinforcement Learningautomated scientific narrative
0 likes · 8 min read
Latest Advances in AI Agents: PaperBanana, SDPO, Lumine, Idea2Story, and Insight Agents
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 19, 2024 · Artificial Intelligence

M2SD: Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning

This paper introduces M2SD, a dual‑branch multiple‑mixing self‑distillation framework that expands feature space, mitigates overfitting and catastrophic forgetting, and achieves state‑of‑the‑art results on CIFAR‑100, CUB‑200 and miniImageNet for few‑shot class‑incremental learning.

Few‑Shot LearningM2SDclass incremental learning
0 likes · 17 min read
M2SD: Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning
AntTech
AntTech
Sep 9, 2022 · Artificial Intelligence

Ant Security Lab Wins Two Golds and One Silver at KDD Cup 2022 with Advanced Keyword Extraction and Self‑Distillation for Product Search

Ant Security Lab's algorithm engineer Lin Jinzheng secured two gold medals and one silver at the KDD Cup 2022, ranking first globally, by applying innovative keyword‑extraction and self‑distillation techniques to improve product search relevance and interactive risk‑control systems.

AIKDD CupProduct Search
0 likes · 4 min read
Ant Security Lab Wins Two Golds and One Silver at KDD Cup 2022 with Advanced Keyword Extraction and Self‑Distillation for Product Search