DeepSeek V4 Unveiled: Why Its Coding Power Beats Claude and GPT
DeepSeek's newly announced V4 model, the successor to its December 2024 V3 release, demonstrates superior coding abilities over Claude and GPT series, details its data composition, infrastructure, training costs, failed experimental attempts, expanded benchmark comparisons, and includes a comprehensive safety report.
Background
DeepSeek released the mHC paper at the start of the year, updated the DeepSeek‑R1 paper on January 4 (expanding it from 22 to 86 pages), and announced that its next‑generation model V4 will be released around Chinese New Year.
V4 Model Overview
V4 is the successor to the V3 model released in December 2024. Internal benchmark tests by DeepSeek staff indicate that V4 outperforms existing models on coding tasks, surpassing Anthropic’s Claude and OpenAI’s GPT series.
DeepSeek‑R1 Paper Update
The updated DeepSeek‑R1 paper is available at https://arxiv.org/abs/2501.12948. It provides detailed technical information, including:
Precise data formula : 26 k mathematics examples and 17 k code examples, with a description of the data creation process.
Infrastructure : Architecture diagram of vLLM combined with a DualPipe design.
Training cost breakdown : Approximately $294 k total, with the R1‑Zero component consuming 198 H800 GPU‑hours.
Failed attempts disclosed :
PRM (Process Reward Model) – scoring each step of problem solving proved inaccurate, easy to exploit, and required costly retraining.
MCTS – inspired by AlphaGo’s tree search, but the textual search space was too large and the value model could not be trained effectively, leading to abandonment.
Expanded comparison scope : Added evaluations against DS‑V3, Claude, and GPT‑4o (previously only compared against o1).
Safety report : A 10‑page report analyzing capability alignment and risk assessment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
