Tagged articles

12 articles

Page 1 of 1

May 12, 2026 · Artificial Intelligence

MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)

MathForge tackles the long‑standing question of which math problems deserve focus in reinforcement‑learning‑based training, introducing a difficulty‑aware optimizer (DGPO) and multi‑aspect question reformulation (MQR) that together prioritize harder‑but‑learnable questions, yielding consistent performance gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationMQR

0 likes · 11 min read

MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)

Machine Heart

Apr 28, 2026 · Artificial Intelligence

Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax

The SHAPE framework (Stage‑aware Hierarchical Advantage via Potential Estimation) adds a milestone‑based “reasoning tax” to large language model inference, providing step‑wise correctness signals and penalizing verbosity, which yields an average 3% accuracy gain and a 30% reduction in token consumption across multiple math‑reasoning benchmarks.

ACL 2026LLMMathematical Reasoning

0 likes · 10 min read

Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax

Machine Heart

Apr 26, 2026 · Artificial Intelligence

How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning

MathForge tackles the overlooked issue of training large language models on mathematically challenging yet learnable problems by introducing a difficulty‑aware group policy optimization (DGPO) and multi‑aspect question reformulation (MQR), achieving consistent gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationMQR

0 likes · 13 min read

How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning

Machine Heart

Apr 22, 2026 · Artificial Intelligence

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

SePT (Self‑evolving Post‑Training) shows that a large language model can improve its mathematical reasoning ability by about ten percentage points using a reward‑free online self‑training loop that decouples generation temperature from standard SFT, matching or surpassing RL‑based methods without harming general performance.

LLMMathematical ReasoningOnline Learning

0 likes · 9 min read

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

Old Meng AI Explorer

Jan 8, 2026 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math‑reasoning model from DeepSeek, introduces a self‑verification mechanism, strong theorem‑proving ability, closed‑loop evolution, and record‑breaking competition scores, offering researchers, educators, and engineers a reliable tool for rigorous mathematical AI tasks.

AI MathMathematical ReasoningProof Assistant

0 likes · 13 min read

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

Old Meng AI Explorer

Dec 7, 2025 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math reasoning model from DeepSeek, introduces a self‑verification mechanism that ensures step‑by‑step logical correctness, achieving gold‑medal scores in IMO 2025, CMO 2024 and near‑perfect results in the Putnam 2024 competition, while offering free, extensible deployment for research, training, and scientific computation.

AI MathDeepSeekMathematical Reasoning

0 likes · 13 min read

HyperAI Super Neural

Dec 6, 2025 · Artificial Intelligence

Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²

This roundup surveys five cutting‑edge AI papers—DeepSeekMath‑V2’s self‑verifiable mathematical reasoning, MedSAM‑3’s promptable medical image and video segmentation, SAM 3D’s single‑image 3D reconstruction, Qwen3‑VL’s high‑capacity vision‑language model, and the M² memory‑mesh transformer for image captioning—highlighting their key methods, benchmarks, and code links.

3D reconstructionImage CaptioningMathematical Reasoning

0 likes · 6 min read

Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²

Baobao Algorithm Notes

Nov 18, 2025 · Artificial Intelligence

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

The LightReasoner paper from Hong Kong University shows that small language models can guide large models on critical reasoning steps, achieving up to 90% faster inference and significant accuracy gains across multiple math benchmarks.

Contrastive DecodingKL divergenceMathematical Reasoning

0 likes · 9 min read

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

AI Frontier Lectures

Jun 5, 2025 · Artificial Intelligence

Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy

This paper introduces the Thought Leap Bridge task and the CoT‑Bridge model, which detect and fill missing intermediate steps in chain‑of‑thought reasoning, dramatically improving large language model performance on mathematical and logical benchmarks and enhancing downstream distillation and reinforcement‑learning pipelines.

Chain-of-ThoughtCoT-BridgeLLM

0 likes · 8 min read

Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy

DevOps

May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekMathematical Reasoning

0 likes · 4 min read

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

Model Perspective

Mar 5, 2025 · Artificial Intelligence

Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough

Researchers from Nanjing University of Aeronautics, Nanjing University of Technology and Oxford show that high‑instruction prompts dramatically boost large language models' mathematical reasoning, enabling DeepSeek‑R1 and Qwen2.5 to solve complex polynomial tasks and even produce a new counterexample to Hilbert's 17th problem.

AIDeepSeekMathematical Reasoning

0 likes · 6 min read

Can AI Really Crack NP‑Hard Problems? Inside the DeepSeek‑R1 Breakthrough

Cognitive Technology Team

Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI SafetyBenchmarkGSM‑Symbolic

0 likes · 9 min read

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark