Kimi K2.7 Code Goes Open: 30% Token Savings and Major Coding Performance Boost
Kimi K2.7 Code, now open‑source on HuggingFace, reduces token consumption by ~30% and boosts coding benchmark scores—Kimi Code Bench v2 climbs from 50.9 to 62.0, Program‑Bench from 48.3 to 53.6, MLS Bench Lite from 26.7 to 35.1—narrowing the gap with GPT‑5.5 and Claude Opus, all built on a 1‑trillion‑parameter MoE architecture with INT4 quantization and a 256K‑token context.
The month‑dark side team announced the release of Kimi K2.7 Code, an open‑source code‑generation model hosted on HuggingFace. Compared with its predecessor K2.6, the new model cuts average token consumption by roughly 30%, allowing higher performance with fewer tokens.
Benchmark results show substantial gains: Kimi Code Bench v2 rises from 50.9 to 62.0 (+21.8%), Program‑Bench from 48.3 to 53.6 (+11%), and MLS Bench Lite from 26.7 to 35.1 (+31.5%). These improvements are illustrated in the benchmark tables below.
When compared with leading models, K2.7 Code narrows the performance gap: on Kimi Code Bench v2, GPT‑5.5 scores 69.0 and Claude Opus 4.8 scores 67.4, while K2.7 Code achieves 62.0. On MLS Bench Lite, K2.7 Code reaches 35.1, almost matching GPT‑5.5’s 35.5.
Agent‑oriented benchmarks also improve. Across three Agent tests, K2.7 Code outperforms K2.6 by about 10%. The Kimi Claw 24/7 Bench score rises from 42.9 to 46.9, MCP Atlas from 69.4 to 76.0, and MCP Mark Verified from 72.8 to 81.1, putting K2.7 Code ahead of Claude Opus 4.8 (76.4) in tool‑calling scenarios.
The model retains the Mixture‑of‑Experts (MoE) architecture of K2.6: 1 trillion total parameters, 32 billion activation parameters, 384 experts with 8 selected per token plus one shared expert, and a context window of 256 K tokens. The vision branch uses a MoonViT encoder with 400 M parameters, supporting image and video inputs.
K2.7 Code is quantized to native INT4, released under a Modified MIT License, and compatible with vLLM, SGLang, and KTransformers deployment frameworks. Pricing remains the same as K2.6: 1 M input tokens cost ¥6.5, output tokens ¥27, and cached input tokens ¥1.3.
To achieve optimal performance, the model must run in Thinking mode; disabling it forces a fallback to K2.6. For non‑coding tasks, the team recommends using the more general K2.6 model.
On June 15, a high‑speed variant of K2.7 Code was launched. It delivers 5–6× faster output (≈180 tokens/s for typical coding, up to 260 tokens/s for short contexts) at roughly double the price of the standard version.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
