DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.
DeepSeek recently open‑sourced a mathematics‑focused large model, DeepSeek‑Prover‑V2, which comes in two sizes: a 671‑billion‑parameter version and a 7‑billion‑parameter version.
The 671B model achieves an 88.9% pass rate on the challenging MiniF2F benchmark and solves 49 out of 658 problems in the PutnamBench suite, demonstrating strong mathematical capabilities.
In addition, DeepSeek released the ProverBench evaluation dataset, a high‑quality collection of math problems for benchmarking reasoning models.
Architecturally, V2‑671B builds on DeepSeek‑V3‑Base, while V2‑7B is derived from DeepSeek‑Prover‑V1.5‑Base, and both extend the context window to up to 32 K tokens.
The model implements a unified mathematical reasoning framework that blends informal and formal reasoning. Complex problems are decomposed into sub‑goals, and V3’s step‑wise reasoning capability is used to connect informal reasoning with formal proof generation.
For cold‑start data generation, V2 employs a recursive theorem‑proving pipeline: V3 creates high‑level proof sketches, which are formalized in Lean4; a smaller 7B model then searches proofs for each sub‑goal, reducing computational load.
During reinforcement learning, the model receives binary correct/incorrect feedback as the primary reward signal, further enhancing its ability to integrate informal reasoning with formal proof construction.
To evaluate performance comprehensively, DeepSeek built the ProverBench benchmark containing 325 problems, including 15 recent AIME24/25 competition problems and 310 problems sourced from textbooks covering topics from high‑school to undergraduate mathematics such as number theory, algebra, linear algebra, analysis, probability, and more.
Open‑source links: Model – https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B; Dataset – https://huggingface.co/datasets/deepseek-ai/DeepSeek-ProverBench.
DevOps
Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.