Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI architectureLarge‑Scale TrainingManifold-Constrained

0 likes · 7 min read

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

PaperAgent

Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency

The article introduces mHC, a Manifold‑Constrained Hyper‑Connections technique that replaces standard residual links with multiple learned pathways, using double‑stochastic matrices to lock gradients, achieving stable training of 27‑billion‑parameter models with only 6.7% extra compute and superior performance across eight downstream benchmarks.

AI architectureEfficient ImplementationManifold-Constrained

0 likes · 6 min read

How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency