Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning
The article introduces KAT‑V1 AutoThink, a dual‑mode large language model that automatically switches between thinking and non‑thinking modes based on problem difficulty, details its novel training paradigm, reinforcement‑learning enhancements, performance benchmarks against leading open‑source models, and provides open‑source resources for further research.
Introducing KAT‑V1 AutoThink
KAT‑V1 is a large language model released and open‑sourced by Kuaishou, featuring two versions (40B and 200B parameters) that can dynamically toggle between thinking and non‑thinking modes according to the difficulty of a query.
Performance Highlights
In automatic‑thinking mode, the 40B version matches the performance of DeepSeek‑R1 (6.85B parameters) released in May, while the 200B version surpasses flagship open‑source models such as Qwen, DeepSeek, and Llama on multiple benchmarks. Notably, on the competition‑grade LiveCodeBench Pro benchmark, the 40B model ranks among closed‑source systems, outperforming many open‑source counterparts.
Key Technical Innovations
The Kwaipilot team proposes a new long‑short thinking hybrid training paradigm and introduces an enhanced reinforcement‑learning algorithm called Step‑SRPO, built on the traditional GRPO method. These innovations improve the model’s token‑thinking density and its ability to decide when to activate the thinking mode.
During pre‑training, the team generated a large corpus of thinking and non‑thinking data. Non‑thinking data were sampled from a 5 TB token pool to ensure diverse, challenging examples, while thinking data were synthesized using an agentic framework consisting of a solver, thinker, and critic, producing high‑quality long‑chain‑of‑thought (long‑CoT) samples.
Approximately 34.8 % of the pre‑training data are thinking samples and 65.2 % are non‑thinking, covering domains such as science, code, mathematics, tool use, and general knowledge.
Knowledge Distillation and MTP
The model is initialized via heterogeneous knowledge distillation from a large teacher model (Qwen2.5‑32B). Two loss components are used: a universal logits distillation loss (ULD) aligning token‑level logits, and a multi‑token prediction (MTP) module that enables the student model to predict several future tokens in a single forward pass, encouraging long‑term planning.
Step‑SRPO Reinforcement Learning
To address the over‑thinking problem, the team designed Step‑SRPO, which first evaluates the necessity of reasoning for each query. Two reward signals guide learning: a judge reward for correctly selecting the reasoning mode, and an answer reward for the quality of the final response. This dual‑reward scheme reduces unnecessary token generation and improves efficiency.
After RL training, the 40B model automatically switches to non‑thinking mode on simple queries, achieving token‑usage reductions of 20‑30 % while maintaining performance comparable to DeepSeek‑R1‑0528.
Complex Reasoning Benchmarks
On challenging benchmarks (e.g., AIME, LCB, GPQA), KAT‑V1 retains high accuracy, and on easier tasks it demonstrates a 10‑30 % performance boost due to selective deep reasoning.
Practical Applications
The model can be guided by explicit user intents to enable or disable thinking, integrates with multi‑agent scenarios, and supports code generation tasks where pre‑thinking yields superior planning and solution quality.
Resources
Model open‑source address: https://huggingface.co/Kwaipilot/KAT-V1-40B
Technical report: https://arxiv.org/pdf/2507.08297
Overseas trial address: https://kwaipilot.ai/search
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
