How 8 Agents Can Converge Stably: Trust‑Region Constraints Reshape Multi‑Agent LLM Workflows
The paper introduces TeamTR, a trust‑region fine‑tuning framework that mitigates compounding occupancy shift in multi‑agent LLM workflows by fresh rollout sampling and token‑level KL constraints, achieving stable performance gains of up to 7.1% overall and dramatic improvements on large‑scale tasks such as AIME24.
