Architect
Jan 1, 2026 · Artificial Intelligence
How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency
DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.
AI ArchitectureLarge-Scale TrainingManifold-Constrained
0 likes · 7 min read
