ArcThink
ArcThink
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility

DeepSeek V4 quietly debuted with a 1.6‑trillion‑parameter MoE model, introducing CSA+HCA compressed attention, mHC manifold‑constrained hyperconnections, and the Muon optimizer, achieving 1M‑token context at a quarter of V3’s cost, top Codeforces and LiveCodeBench scores, a 1/7 Opus price, MIT open‑source licensing, and dual‑stack Ascend NPU/NVIDIA GPU support.

DeepSeek V4Large Language ModelManifold-constrained Hyperconnection
0 likes · 17 min read
DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: 1M‑Token Context and New Architecture Challenge Closed‑Source LLMs

DeepSeek V4 introduces two flagship models—V4‑Pro with 1.6 T parameters and V4‑Flash with 284 B parameters—offering million‑token context, mixed attention (CSA + HCA), manifold‑constrained residuals, and the Muon optimizer, delivering open‑source performance that rivals top closed‑source LLMs while cutting inference cost dramatically.

1M contextDeepSeekLarge Language Model
0 likes · 10 min read
DeepSeek V4 Unveiled: 1M‑Token Context and New Architecture Challenge Closed‑Source LLMs
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants

DeepSeek-V4 introduces two open‑source LLMs—V4‑Pro with 1.6 trillion total parameters and V4‑Flash with 284 billion—offering a 1 million‑token context window, hybrid attention, multi‑head compression, and a new Muon optimizer, all under an MIT license that rivals top closed‑source models.

DeepSeek V4Hybrid attentionLarge Language Model
0 likes · 6 min read
DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 4, 2026 · Artificial Intelligence

How Gram‑Newton‑Schulz Halves Muon Optimizer’s Compute Cost for Trillion‑Parameter Models

The article explains how the Muon optimizer’s expensive Newton‑Schulz orthogonalization is accelerated by the Gram‑Newton‑Schulz algorithm, which reduces end‑to‑end orthogonalization time by 40‑50%, achieves up to 2× speed‑up in large‑scale LLM training, and resolves numerical stability issues through a restart strategy and custom GPU kernels.

GPU kernelsGram Newton-SchulzMuon optimizer
0 likes · 9 min read
How Gram‑Newton‑Schulz Halves Muon Optimizer’s Compute Cost for Trillion‑Parameter Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 17, 2025 · Artificial Intelligence

How QK-Clip Tames MaxLogit Explosions in Trillion‑Parameter LLMs

The article introduces QK-Clip, a lightweight per‑head weight‑clipping technique that uses the MaxLogit signal to prevent uncontrolled logit growth in massive LLMs, explains its design, compares it with prior methods, and shows that it stabilizes training without harming model performance.

Attention stabilityLLM trainingMaxLogit
0 likes · 15 min read
How QK-Clip Tames MaxLogit Explosions in Trillion‑Parameter LLMs