Artificial Intelligence 10 min read

Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI

Kimi, a three‑year‑old AI‑native unicorn valued over $120 billion, launches a “Time‑Machine” option program that grants interns equity while showcasing its rapid valuation growth, record‑breaking context lengths, novel Kimi Linear architecture, token‑efficiency gains, and open‑source models that rival leading LLMs.

Machine Heart

Apr 3, 2026

Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI

Kimi, a three‑year‑old AI‑native startup now valued at over 1200 billion CNY (≈$120 billion), announced a “穿越计划” (Time‑Machine) talent program that awards interns a pre‑granted, locked‑up stock‑option allocation based on the company’s 2026 valuation. The scheme lets young engineers obtain equity before the company’s next valuation jump, effectively turning the internship into a high‑growth ticket.

Over the past three years Kimi’s valuation has multiplied by nearly fourfold, crossing the $180 billion mark in less than three years—faster than ByteDance (four years) and Pinduoduo (over three years) to reach a hundred‑billion‑dollar valuation.

Technically, Kimi has delivered several milestones: in October 2023 it released a model supporting roughly 200 k tokens, breaking the global context‑length record; by March 2024 it extended support to 2 million tokens, enabling single‑pass processing of massive legal, medical, or code corpora.

In 2025 Kimi introduced the Kimi Linear hybrid linear‑attention architecture, which attracted industry attention as a key breakthrough for long‑context, efficient inference. The same year the open‑source Kimi K2 Thinking model was released, claiming superiority over GPT‑5 and Claude Sonnet 4.5 on several core capabilities; HuggingFace co‑founder Thomas Wolf described it as “another DeepSeek moment.”

Early 2025 saw the launch of K2.5, a multimodal model adding video understanding and markedly improved coding abilities, while continuing the open‑source strategy.

Kimi’s research agenda focuses on three pillars: token efficiency, context‑length scalability, and agent‑swarm intelligence.

For token efficiency, the team integrated a distributed Muon optimizer and QK‑pruning, delivering roughly a 2× efficiency gain under identical parameter and data budgets.

To overcome the quadratic cost of full attention, Kimi Linear replaces it with an improved linear‑attention mechanism, maintaining high throughput even when decoding up to 1 million tokens.

The agent‑swarm effort abandons handcrafted workflow pipelines, instead employing a large‑scale reinforcement‑learning system with three reward tiers—instance, completion, and result—to teach models parallel task decomposition and coordination, avoiding “serial collapse” or “false parallelism.”

Recent research also explores “Attention Residuals,” a 90° rotation of attention into the depth dimension, with variants that reduce memory from O(L·d) to O(N·d) by block‑wise aggregation.

Kimi’s culture emphasizes flat, low‑entropy organization: over 300 employees average under 30 years old, no formal departments or titles, and unrestricted direct communication, forming a “Genius Swarm.” Interns enjoy unlimited token resources, full research freedom, and authorship rights on papers.

From a market perspective, the rapid AI‑era talent demand creates an absolute buyer’s market for top engineers. By locking in valuation‑linked equity, Kimi signals that AGI‑era entry tickets are no longer exclusive to large VC‑backed firms but are being offered directly to high‑potential technologists.

Large Language Model Token Efficiency Context Length Kimi Attention Residuals Agent Swarms AI Talent Program

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.