Tag

mathematical reasoning

0 views collected around this technical thread.

DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekProverBench
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
Model Perspective
Model Perspective
Mar 5, 2025 · Artificial Intelligence

Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough

Researchers from Nanjing University of Aeronautics, Nanjing University of Technology and Oxford show that high‑instruction prompts dramatically boost large language models' mathematical reasoning, enabling DeepSeek‑R1 and Qwen2.5 to solve complex polynomial tasks and even produce a new counterexample to Hilbert's 17th problem.

AIDeepSeekNP-hard
0 likes · 6 min read
Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough
Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI safetyBenchmarkGSM‑Symbolic
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark