Machine Learning Algorithms & Natural Language Processing
Mar 6, 2026 · Artificial Intelligence
Why Reasoning and Tool-Use Clash in Agentic RL—and How DART Solves It
Recent studies reveal that in Agentic RL, jointly training reasoning and tool-use on shared parameters creates a persistent negative interaction, with gradients nearly orthogonal, limiting performance; a disentangled tuning approach (DART) using separate LoRA adapters isolates the two abilities and restores gains across benchmarks.
Agentic RLDARTGradient Interference
0 likes · 12 min read
