Machine Heart
Machine Heart
Apr 3, 2026 · Artificial Intelligence

Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI

Kimi, a three‑year‑old AI‑native unicorn valued over $120 billion, launches a “Time‑Machine” option program that grants interns equity while showcasing its rapid valuation growth, record‑breaking context lengths, novel Kimi Linear architecture, token‑efficiency gains, and open‑source models that rival leading LLMs.

AI Talent ProgramAgent SwarmsAttention Residuals
0 likes · 10 min read
Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI
SuanNi
SuanNi
Mar 17, 2026 · Artificial Intelligence

How Attention Residuals Boost Transformer Efficiency and Scale

The article presents the Attention Residuals architecture, explains how it replaces uniform residual addition with learned attention‑based aggregation, details full and block variants, engineering tricks for distributed training, and shows extensive scaling‑law experiments where the new design consistently improves validation loss and training efficiency across model sizes.

Attention ResidualsTransformerdeep learning
0 likes · 13 min read
How Attention Residuals Boost Transformer Efficiency and Scale
ShiZhen AI
ShiZhen AI
Mar 17, 2026 · Artificial Intelligence

Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE

The Kimi team introduces Attention Residuals, a softmax‑based replacement for the uniform residual connections used in Transformers for a decade, enabling selective aggregation of layer histories, reducing hidden‑state growth, and achieving a 1.25× compute‑efficiency gain on a 48‑billion‑parameter MoE model with less than 2% inference latency increase.

Attention ResidualsCompute EfficiencyMoE
0 likes · 10 min read
Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE