How Large-Scale Reinforcement Learning Boosted KAT-Dev-72B-Exp to 74.6% on SWE‑Bench
The KwaiPilot team introduced KAT-Dev-72B-Exp, an open‑source LLM trained with large‑scale reinforcement learning that achieved a record‑breaking 74.6% score on SWE‑Bench Verified, thanks to innovations like Trie Packing, entropy‑aware advantage scaling, and a decoupled data‑environment architecture.
Large-scale reinforcement learning (RL) is a key pathway to unlock complex reasoning and improve task generalization of large models. The KwaiPilot team recently released KAT-Dev-72B-Exp, which achieved an outstanding 74.6% performance on the SWE‑Bench Verified benchmark, setting a new record among open‑source models.
1. Trie Packing
The model is built on the self‑developed SeamlessFlow industrial RL framework, which decouples training logic from the agent via an innovative data‑plane architecture, supporting multi‑agent and online RL scenarios. To address complex agent challenges, the team introduced a Trie Packing mechanism and restructured the training engine, enabling efficient training on shared‑prefix trajectories.
2. Entropy‑Aware Advantage Scaling
In large‑scale LLM agentic training, token trajectories form tree structures. Instead of flattening them into independent linear sequences, the training engine and attention kernel were rewritten to merge repeated backward computations on shared prefixes, achieving a 2.5× speedup. Additionally, an entropy‑based advantage scaling method was proposed: each rollout’s policy entropy is normalized and used as a multiplier for the advantage, amplifying high‑entropy (exploratory) samples and suppressing low‑entropy ones, thereby improving the exploration‑exploitation balance.
The method integrates with GRPO’s group‑wise optimization while enhancing policy exploration.
3. Summary and Outlook
Efficient, scalable data environments are crucial for successful agentic RL training. The team is building a large‑scale data‑environment management system that fully decouples training data, sandbox, and framework, allowing independent expansion of data sources, safe isolated testing, and flexible algorithm iteration. This modular design accelerates data expansion across code, mathematics, games, and more, improving model generalization, robustness, and real‑world applicability.
Free trial of KAT‑Coder: https://www.streamlake.ai/product/kat-coder
Open‑source repository: https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
