Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning

The article introduces KAT‑V1 AutoThink, a dual‑mode large language model that automatically switches between thinking and non‑thinking modes based on problem difficulty, details its novel training paradigm, reinforcement‑learning enhancements, performance benchmarks against leading open‑source models, and provides open‑source resources for further research.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning

Introducing KAT‑V1 AutoThink

KAT‑V1 is a large language model released and open‑sourced by Kuaishou, featuring two versions (40B and 200B parameters) that can dynamically toggle between thinking and non‑thinking modes according to the difficulty of a query.

KAT‑V1 AutoThink illustration
KAT‑V1 AutoThink illustration

Performance Highlights

In automatic‑thinking mode, the 40B version matches the performance of DeepSeek‑R1 (6.85B parameters) released in May, while the 200B version surpasses flagship open‑source models such as Qwen, DeepSeek, and Llama on multiple benchmarks. Notably, on the competition‑grade LiveCodeBench Pro benchmark, the 40B model ranks among closed‑source systems, outperforming many open‑source counterparts.

Benchmark results
Benchmark results

Key Technical Innovations

The Kwaipilot team proposes a new long‑short thinking hybrid training paradigm and introduces an enhanced reinforcement‑learning algorithm called Step‑SRPO, built on the traditional GRPO method. These innovations improve the model’s token‑thinking density and its ability to decide when to activate the thinking mode.

During pre‑training, the team generated a large corpus of thinking and non‑thinking data. Non‑thinking data were sampled from a 5 TB token pool to ensure diverse, challenging examples, while thinking data were synthesized using an agentic framework consisting of a solver, thinker, and critic, producing high‑quality long‑chain‑of‑thought (long‑CoT) samples.

Approximately 34.8 % of the pre‑training data are thinking samples and 65.2 % are non‑thinking, covering domains such as science, code, mathematics, tool use, and general knowledge.

Knowledge Distillation and MTP

The model is initialized via heterogeneous knowledge distillation from a large teacher model (Qwen2.5‑32B). Two loss components are used: a universal logits distillation loss (ULD) aligning token‑level logits, and a multi‑token prediction (MTP) module that enables the student model to predict several future tokens in a single forward pass, encouraging long‑term planning.

Distillation architecture
Distillation architecture

Step‑SRPO Reinforcement Learning

To address the over‑thinking problem, the team designed Step‑SRPO, which first evaluates the necessity of reasoning for each query. Two reward signals guide learning: a judge reward for correctly selecting the reasoning mode, and an answer reward for the quality of the final response. This dual‑reward scheme reduces unnecessary token generation and improves efficiency.

Step‑SRPO workflow
Step‑SRPO workflow

After RL training, the 40B model automatically switches to non‑thinking mode on simple queries, achieving token‑usage reductions of 20‑30 % while maintaining performance comparable to DeepSeek‑R1‑0528.

Complex Reasoning Benchmarks

On challenging benchmarks (e.g., AIME, LCB, GPQA), KAT‑V1 retains high accuracy, and on easier tasks it demonstrates a 10‑30 % performance boost due to selective deep reasoning.

Practical Applications

The model can be guided by explicit user intents to enable or disable thinking, integrates with multi‑agent scenarios, and supports code generation tasks where pre‑thinking yields superior planning and solution quality.

Resources

Model open‑source address: https://huggingface.co/Kwaipilot/KAT-V1-40B

Technical report: https://arxiv.org/pdf/2507.08297

Overseas trial address: https://kwaipilot.ai/search

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelreinforcement learningKnowledge DistillationModel Efficiencyauto-think
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.