How DeepSeek and Kimi’s Open‑Source Collaboration Is Redefining China’s AI Landscape

The article analyses DeepSeek V4’s technical report, revealing repeated “encounters” between DeepSeek and Kimi—shared MLA attention, Muon optimizer, and divergent long‑context strategies—while highlighting their open‑source releases, hardware adaptations, and ecosystem impact that dramatically lower deployment costs for Chinese AI.

Machine Heart
Machine Heart
Machine Heart
How DeepSeek and Kimi’s Open‑Source Collaboration Is Redefining China’s AI Landscape

Multiple Encounters and a Shared Vision

Since early 2024, Chinese LLM developers have launched a series of models—Qwen, Kimi, Xiaomi, Tencent—and on Friday DeepSeek finally released the V4 dual versions, sparking a wave of discussion in the domestic AI community.

Both DeepSeek and Kimi have entered the trillion‑parameter club and opened their models to the public, with Xiaomi also promising to open‑source its upcoming trillion‑parameter model.

Behind the Repeated “Encounters”

DeepSeek‑R1 and Kimi K1.5 were released only two hours apart; DeepSeek‑NSA and Kimi MoBA papers appeared simultaneously; Kimi’s mathematical reasoning model Kimina‑Prover inspired DeepSeek‑Prover V2. Most recently, DeepSeek‑V4 and Kimi K2.6 launched in the same week, illustrating a pattern of coordinated progress.

The founders Liang Wenfeng and Yang Zhiling share a belief in scaling laws and a race toward AGI, which explains the frequent technical cross‑pollination.

MLA Attention: DeepSeek Innovates, Kimi Reuses

DeepSeek introduced the MLA (Multi‑Level Attention) mechanism in V3, using low‑rank compression to cut memory usage and enable long‑context inference. Kimi adopted the same MLA design in its own attention module, demonstrating rapid community uptake.

Second‑Order Optimizer: Muon Validated by Kimi, Adopted by DeepSeek

In February 2025, Kimi published “Muon is Scalable for LLM Training,” validating the Muon optimizer on the 480‑billion‑parameter Moonlight series and replacing the decade‑old Adam optimizer. The optimizer was first deployed at scale in Kimi K2 (July 2025) and later incorporated into DeepSeek V4, improving training stability.

Residual Connections: Divergent Solutions

DeepSeek’s V4 adds an mHC residual connection that modifies multi‑head attention concatenation, boosting gradient flow and yielding roughly a 30 % training‑efficiency gain. Kimi introduced “Attention Residuals,” which streamline information flow and have been praised by Andrej Karpathy, Jerry Tworek, and even Elon Musk.

Long‑Context Inference: Sparse vs. Linear Attention

DeepSeek adopts sparse attention, focusing on key input parts to reduce computation, making million‑token contexts more affordable but requiring complex design and tuning.

Kimi implements linear attention, lowering the computational complexity from O(n²) to O(n), offering higher efficiency at the cost of different trade‑offs.

Both approaches provide complementary options for future long‑context research.

From Two Companies to a Shared Infrastructure

Unlike closed models such as GPT‑4 or Claude 3.5, DeepSeek and Kimi release fully open‑source trillion‑parameter models, allowing any developer or organization to obtain and fine‑tune them for free. This openness reduces private‑deployment costs to roughly one‑tenth of previous levels, enabling small‑ and medium‑size enterprises to run trillion‑parameter models on their own servers.

In the ecosystem, both models rank among the top two API calls on China’s OpenRouter platform. Kimi is integrated into popular overseas coding tools, while DeepSeek powers Japan’s Rakuten AI 3.0.

Hardware Adaptation

DeepSeek V4 is the first to deeply integrate with Huawei’s Ascend chips for inference on domestic hardware. Kimi’s Prefill‑as‑a‑Service framework distributes Prefill and Decode stages across heterogeneous domestic chips, achieving a 54 % throughput increase and a 64 % reduction in first‑token latency.

Industry Recognition and Impact

Meta’s Muse Spark blog compared Llama 4 with DeepSeek‑V3.1 and Kimi‑K2, while Nvidia CEO Jensen Huang highlighted DeepSeek and Kimi K2‑Thinking as benchmark models for Blackwell and Rubin chips during his CES keynote.

These acknowledgments underscore the growing global relevance of Chinese open‑source LLMs.

Conclusion: Collaborative Strength Fuels Chinese AI

The rapid, coordinated advances of DeepSeek and Kimi—driven by shared scaling‑law beliefs, open‑source releases, and mutual technology adoption—demonstrate that China’s AI momentum stems not from isolated competition but from collaborative “encounters” that accelerate innovation across the ecosystem.

AILLMDeepSeekMLAKimiOpen-sourceMuon
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.