Tagged articles
2 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 3, 2026 · Artificial Intelligence

How 8 Agents Can Converge Stably: Trust‑Region Constraints Reshape Multi‑Agent LLM Workflows

The paper introduces TeamTR, a trust‑region fine‑tuning framework that mitigates compounding occupancy shift in multi‑agent LLM workflows by fresh rollout sampling and token‑level KL constraints, achieving stable performance gains of up to 7.1% overall and dramatic improvements on large‑scale tasks such as AIME24.

AI coordinationTeamTRfine-tuning
0 likes · 9 min read
How 8 Agents Can Converge Stably: Trust‑Region Constraints Reshape Multi‑Agent LLM Workflows
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 22, 2026 · Artificial Intelligence

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

The article explains On-Policy Distillation, derives its forward and reverse KL gradients, introduces Self‑Distillation where the policy serves as its own teacher, discusses practical implementation tricks such as extra‑knowledge injection, EMA or trust‑region teacher stabilization, and highlights benefits like reduced catastrophic forgetting, fewer Aha moments, and a narrower train‑test gap, especially for larger models.

Catastrophic ForgettingEMAKL divergence
0 likes · 6 min read
What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation