Tagged articles
12 articles
Page 1 of 1
Data Party THU
Data Party THU
May 12, 2026 · Artificial Intelligence

MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)

MathForge tackles the long‑standing question of which math problems deserve focus in reinforcement‑learning‑based training, introducing a difficulty‑aware optimizer (DGPO) and multi‑aspect question reformulation (MQR) that together prioritize harder‑but‑learnable questions, yielding consistent performance gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationMQR
0 likes · 11 min read
MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)
Machine Heart
Machine Heart
Apr 28, 2026 · Artificial Intelligence

Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax

The SHAPE framework (Stage‑aware Hierarchical Advantage via Potential Estimation) adds a milestone‑based “reasoning tax” to large language model inference, providing step‑wise correctness signals and penalizing verbosity, which yields an average 3% accuracy gain and a 30% reduction in token consumption across multiple math‑reasoning benchmarks.

ACL 2026LLMMathematical Reasoning
0 likes · 10 min read
Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax
Machine Heart
Machine Heart
Apr 26, 2026 · Artificial Intelligence

How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning

MathForge tackles the overlooked issue of training large language models on mathematically challenging yet learnable problems by introducing a difficulty‑aware group policy optimization (DGPO) and multi‑aspect question reformulation (MQR), achieving consistent gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationMQR
0 likes · 13 min read
How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning
Machine Heart
Machine Heart
Apr 22, 2026 · Artificial Intelligence

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

SePT (Self‑evolving Post‑Training) shows that a large language model can improve its mathematical reasoning ability by about ten percentage points using a reward‑free online self‑training loop that decouples generation temperature from standard SFT, matching or surpassing RL‑based methods without harming general performance.

LLMMathematical ReasoningOnline Learning
0 likes · 9 min read
Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training
Old Meng AI Explorer
Old Meng AI Explorer
Jan 8, 2026 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math‑reasoning model from DeepSeek, introduces a self‑verification mechanism, strong theorem‑proving ability, closed‑loop evolution, and record‑breaking competition scores, offering researchers, educators, and engineers a reliable tool for rigorous mathematical AI tasks.

AI MathMathematical ReasoningProof Assistant
0 likes · 13 min read
Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning
Old Meng AI Explorer
Old Meng AI Explorer
Dec 7, 2025 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math reasoning model from DeepSeek, introduces a self‑verification mechanism that ensures step‑by‑step logical correctness, achieving gold‑medal scores in IMO 2025, CMO 2024 and near‑perfect results in the Putnam 2024 competition, while offering free, extensible deployment for research, training, and scientific computation.

AI MathDeepSeekMathematical Reasoning
0 likes · 13 min read
Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning
HyperAI Super Neural
HyperAI Super Neural
Dec 6, 2025 · Artificial Intelligence

Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²

This roundup surveys five cutting‑edge AI papers—DeepSeekMath‑V2’s self‑verifiable mathematical reasoning, MedSAM‑3’s promptable medical image and video segmentation, SAM 3D’s single‑image 3D reconstruction, Qwen3‑VL’s high‑capacity vision‑language model, and the M² memory‑mesh transformer for image captioning—highlighting their key methods, benchmarks, and code links.

3D reconstructionImage CaptioningMathematical Reasoning
0 likes · 6 min read
Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²
AI Frontier Lectures
AI Frontier Lectures
Jun 5, 2025 · Artificial Intelligence

Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy

This paper introduces the Thought Leap Bridge task and the CoT‑Bridge model, which detect and fill missing intermediate steps in chain‑of‑thought reasoning, dramatically improving large language model performance on mathematical and logical benchmarks and enhancing downstream distillation and reinforcement‑learning pipelines.

Chain-of-ThoughtCoT-BridgeLLM
0 likes · 8 min read
Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy
DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekMathematical Reasoning
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
Model Perspective
Model Perspective
Mar 5, 2025 · Artificial Intelligence

Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough

Researchers from Nanjing University of Aeronautics, Nanjing University of Technology and Oxford show that high‑instruction prompts dramatically boost large language models' mathematical reasoning, enabling DeepSeek‑R1 and Qwen2.5 to solve complex polynomial tasks and even produce a new counterexample to Hilbert's 17th problem.

AIDeepSeekMathematical Reasoning
0 likes · 6 min read
Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough
Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI SafetyBenchmarkGSM‑Symbolic
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark