Kimi K2-Thinking: 1T‑Parameter Agent Model Beats GPT‑5 on Humanity’s Last Exam
Kimi's open‑source K2‑Thinking model, a 1‑trillion‑parameter agent with native INT4 quantization and 256k context, achieves SOTA performance on benchmarks like Humanity’s Last Exam, BrowseComp and SEAL‑0, outperforms GPT‑5 and Grok‑4, and demonstrates complex tool‑driven reasoning through real‑world examples.
Model Overview
K2‑Thinking is an open‑source large language model released by Kimi. It has a total of 1 trillion parameters with 32 billion active parameters, supports native INT4 quantization, and provides a 256 k token context window.
Model‑as‑Agent Design
The model follows a “model‑as‑Agent” paradigm, meaning it can initiate tool calls while reasoning. It can execute up to 300 tool calls in a single session and maintain stable multi‑turn reasoning without any manually crafted prompt logic such as if/while constructs.
Benchmark Performance
Humanity’s Last Exam (HLE) : 44.9 % score in tool‑enhanced mode, outperforming GPT‑5 (41.7 %) and Grok‑4 (41.0 %).
State‑of‑the‑art results were also reported on BrowseComp and SEAL‑0, demonstrating strong agentic search, programming, writing, and comprehensive reasoning capabilities.
Demonstration Tasks
Example 1 – Policy‑Rule Calculation
The task required calculating the total energy‑point score for a three‑person Beijing household based on detailed car‑license lottery participation rules. K2‑Thinking performed web browsing to retrieve policy details, interpreted the rules, carried out multi‑step verification, and produced a completely correct answer, whereas GPT‑5 gave an incorrect result.
Example 2 – Nvidia Market‑Cap Retrieval
The task asked the model to collect Nvidia’s month‑end market‑cap data from January to October 2025 and generate an animated line chart viewable in a browser. K2‑Thinking decomposed the problem into 11 sub‑tasks, fetched the required data, and generated the HTML/JavaScript code that renders a correct and visually appealing chart.
Pricing and API
Input tokens: 4 CNY per million.
Output tokens: 16 CNY per million.
Cache‑hit input: 1 CNY per million.
Turbo API throughput: up to 100 tokens/s.
Turbo API pricing: 8 CNY input per million, 58 CNY output per million, cache‑hit 1 CNY.
Resources
Project repository: https://huggingface.co/moonshotai/Kimi-K2-Thinking
Technical blog: https://moonshotai.github.io/Kimi-K2/thinking.html
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
