Kimi K2.7 Code: 1T MoE Model Cuts Tokens 30% and Beats Claude Opus on MCP Calls

The newly released Kimi K2.7 Code, a 1‑trillion‑parameter mixture‑of‑experts model that activates only 32 B parameters per inference, offers a 256 K context window, supports multimodal input, improves benchmark scores by up to 31.5 % over K2.6, reduces inference token usage by about 30 %, and achieves an 81.1 MCP tool‑call score surpassing Claude Opus 4.8, while providing a CLI installation command and usage guidelines.

AI Insight Log
AI Insight Log
AI Insight Log
Kimi K2.7 Code: 1T MoE Model Cuts Tokens 30% and Beats Claude Opus on MCP Calls

Kimi K2.7 Code has been released as the latest coding‑specialized model from Moonlight, open‑sourced on HuggingFace under a modified MIT license that permits commercial use. Within 24 hours the announcement tweet garnered 1.07 million views, underscoring strong interest in coding‑focused LLMs.

The model is not listed in the generic Kimi workspace; it is accessible only through the Kimi Code client and the Kimi API, a deliberate design choice to keep the coding agent separate from the general assistant.

Architecture: K2.7 Code is a Mixture‑of‑Experts (MoE) model with a total parameter count of 1 trillion, but each inference activates only 32 B parameters. It comprises 384 experts across 61 layers (including one dense layer) and provides a 256 K context window. The model accepts text, image, and video inputs, using the MoonViT visual encoder (400 M parameters). Activating 32 B parameters places the model in a mid‑to‑high range, avoiding the limitations of small models while avoiding the full cost of dense large models.

Compared with its predecessor K2.6, K2.7 Code shows notable gains on three benchmarks:

Kimi Code Bench v2: 62.0 vs 50.9 (+21.8%).

Program Bench: 53.6 vs 48.3 (+11.0%).

MLS Bench Lite: 35.1 vs 26.7 (+31.5%).

When measured against top‑tier models, K2.7 Code still lags on pure coding ability: GPT‑5.5 (xhigh) scores 69.1 and Claude Opus 4.8 scores 63.8 on Program Bench, while K2.7 Code records 53.6. However, on the MCP tool‑call benchmark—a critical metric for coding agents in real‑world workflows—K2.7 Code achieves 81.1 points, surpassing Claude Opus 4.8 (76.4) and falling only 1.8 points short of GPT‑5.5 (82.9).

Inference efficiency is another strength: K2.7 Code consumes roughly 30 % fewer inference tokens than K2.6, a reduction the developers describe as "reducing overthinking." Fewer tokens lower cost and speed up each Agent step, especially in long, multi‑tool tasks. A scatter plot in the original article illustrates simultaneous improvements in performance and token usage.

Kimi K2.7 Code release tweet
Kimi K2.7 Code release tweet

For developers, the recommended entry point is the Kimi Code CLI, installed with a single command:

curl -fsSL https://code.kimi.com/kimi-code/install.sh | bash

Existing projects using the OpenAI SDK can switch to K2.7 Code by changing the base URL and API key:

from openai import OpenAI

client = OpenAI(
    api_key="your_moonshot_api_key",
    base_url="https://api.moonshot.ai/v1"
)

The model identifier is kimi-k2.7-code. Several parameters are fixed: the "thinking" mode cannot be disabled, temperature is locked at 1.0, and top_p at 0.95. In multi‑step tool‑call scenarios, the reasoning_content field must be included in the context, otherwise an error is returned.

Official reply: K2.7 Code only available in Kimi Code and API
Official reply: K2.7 Code only available in Kimi Code and API

The team also mentions an upcoming "6× high‑speed mode," though a launch date has not been set.

Overall, K2.7 Code follows the same trajectory as Claude Code and Codex by extracting the coding agent from a general assistant and optimizing it. The MoE architecture reduces inference cost, specialized training boosts coding ability, and open‑sourcing enables community deployment. While pure coding benchmarks still trail the leading models, the strong MCP tool‑call performance demonstrates a compelling advantage for real‑world coding assistance.

Performance vs Token Usage Scatter Plot
Performance vs Token Usage Scatter Plot
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MCPMixture of ExpertsbenchmarkKimiCoding Modelinference efficiency
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.