Artificial Intelligence 6 min read

How Large-Scale Reinforcement Learning Boosted KAT-Dev-72B-Exp to 74.6% on SWE‑Bench

The KwaiPilot team introduced KAT-Dev-72B-Exp, an open‑source LLM trained with large‑scale reinforcement learning that achieved a record‑breaking 74.6% score on SWE‑Bench Verified, thanks to innovations like Trie Packing, entropy‑aware advantage scaling, and a decoupled data‑environment architecture.

Kuaishou Large Model

Oct 11, 2025

How Large-Scale Reinforcement Learning Boosted KAT-Dev-72B-Exp to 74.6% on SWE‑Bench

Large-scale reinforcement learning (RL) is a key pathway to unlock complex reasoning and improve task generalization of large models. The KwaiPilot team recently released KAT-Dev-72B-Exp, which achieved an outstanding 74.6% performance on the SWE‑Bench Verified benchmark, setting a new record among open‑source models.

1. Trie Packing

The model is built on the self‑developed SeamlessFlow industrial RL framework, which decouples training logic from the agent via an innovative data‑plane architecture, supporting multi‑agent and online RL scenarios. To address complex agent challenges, the team introduced a Trie Packing mechanism and restructured the training engine, enabling efficient training on shared‑prefix trajectories.

2. Entropy‑Aware Advantage Scaling

In large‑scale LLM agentic training, token trajectories form tree structures. Instead of flattening them into independent linear sequences, the training engine and attention kernel were rewritten to merge repeated backward computations on shared prefixes, achieving a 2.5× speedup. Additionally, an entropy‑based advantage scaling method was proposed: each rollout’s policy entropy is normalized and used as a multiplier for the advantage, amplifying high‑entropy (exploratory) samples and suppressing low‑entropy ones, thereby improving the exploration‑exploitation balance.

The method integrates with GRPO’s group‑wise optimization while enhancing policy exploration.

3. Summary and Outlook

Efficient, scalable data environments are crucial for successful agentic RL training. The team is building a large‑scale data‑environment management system that fully decouples training data, sandbox, and framework, allowing independent expansion of data sources, safe isolated testing, and flexible algorithm iteration. This modular design accelerates data expansion across code, mathematics, games, and more, improving model generalization, robustness, and real‑world applicability.

Free trial of KAT‑Coder: https://www.streamlake.ai/product/kat-coder

Open‑source repository: https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models Reinforcement Learning entropy scaling KAT-Dev-72B-Exp software engineering benchmark Trie Packing

Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.