Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It
Recent studies reveal that in Agentic RL, jointly training reasoning and tool-use on shared parameters creates a persistent negative interaction, with gradients nearly orthogonal, limiting performance; a disentangled tuning approach (DART) using separate LoRA adapters isolates the two abilities and restores gains across benchmarks.
