Kimi K2-Thinking: 1T‑Parameter Agent Model Beats GPT‑5 on Humanity’s Last Exam

Kimi's open‑source K2‑Thinking model, a 1‑trillion‑parameter agent with native INT4 quantization and 256k context, achieves SOTA performance on benchmarks like Humanity’s Last Exam, BrowseComp and SEAL‑0, outperforms GPT‑5 and Grok‑4, and demonstrates complex tool‑driven reasoning through real‑world examples.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
Kimi K2-Thinking: 1T‑Parameter Agent Model Beats GPT‑5 on Humanity’s Last Exam

Model Overview

K2‑Thinking is an open‑source large language model released by Kimi. It has a total of 1 trillion parameters with 32 billion active parameters, supports native INT4 quantization, and provides a 256 k token context window.

Model‑as‑Agent Design

The model follows a “model‑as‑Agent” paradigm, meaning it can initiate tool calls while reasoning. It can execute up to 300 tool calls in a single session and maintain stable multi‑turn reasoning without any manually crafted prompt logic such as if/while constructs.

Benchmark Performance

Humanity’s Last Exam (HLE) : 44.9 % score in tool‑enhanced mode, outperforming GPT‑5 (41.7 %) and Grok‑4 (41.0 %).

State‑of‑the‑art results were also reported on BrowseComp and SEAL‑0, demonstrating strong agentic search, programming, writing, and comprehensive reasoning capabilities.

Demonstration Tasks

Example 1 – Policy‑Rule Calculation

The task required calculating the total energy‑point score for a three‑person Beijing household based on detailed car‑license lottery participation rules. K2‑Thinking performed web browsing to retrieve policy details, interpreted the rules, carried out multi‑step verification, and produced a completely correct answer, whereas GPT‑5 gave an incorrect result.

Policy‑Rule Calculation Result
Policy‑Rule Calculation Result

Example 2 – Nvidia Market‑Cap Retrieval

The task asked the model to collect Nvidia’s month‑end market‑cap data from January to October 2025 and generate an animated line chart viewable in a browser. K2‑Thinking decomposed the problem into 11 sub‑tasks, fetched the required data, and generated the HTML/JavaScript code that renders a correct and visually appealing chart.

Nvidia Market‑Cap Data Retrieval
Nvidia Market‑Cap Data Retrieval
Generated Chart
Generated Chart

Pricing and API

Input tokens: 4 CNY per million.

Output tokens: 16 CNY per million.

Cache‑hit input: 1 CNY per million.

Turbo API throughput: up to 100 tokens/s.

Turbo API pricing: 8 CNY input per million, 58 CNY output per million, cache‑hit 1 CNY.

Resources

Project repository: https://huggingface.co/moonshotai/Kimi-K2-Thinking

Technical blog: https://moonshotai.github.io/Kimi-K2/thinking.html

AIbenchmarkKimiAgent ModelK2-Thinking
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.