Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchLarge Language ModelModel Optimization

0 likes · 12 min read

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

automatic thinking

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning