Tagged articles
16 articles
Page 1 of 1
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

I²B‑LPO is an exploration‑enhancement framework for RLVR that branches rollouts at high‑entropy nodes, injects latent variables via pseudo self‑attention, and filters paths with an information‑bottleneck self‑reward, achieving up to 5.3% accuracy and 7.4% diversity improvements on multiple math reasoning benchmarks.

RLVRentropyexploration
0 likes · 14 min read
Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

Can Adaptive Guidance Unlock Small Model Reasoning? Introducing G²RPO‑A

The paper identifies reward sparsity as the core obstacle for small language models in reinforcement‑learning‑based reasoning, proposes G²RPO‑A which injects high‑quality thinking trajectories and dynamically adjusts guidance length, and demonstrates large accuracy gains on math and code benchmarks such as Qwen3‑1.7B improving from 50.96 % to 67.21 % on MATH500 and from 46.08 % to 75.93 % on HumanEval.

G²RPO‑Aadaptive guidancecode-generation
0 likes · 10 min read
Can Adaptive Guidance Unlock Small Model Reasoning? Introducing G²RPO‑A
AI Engineering
AI Engineering
May 6, 2026 · Artificial Intelligence

GPT-5.5 Instant Launch Cuts Hallucinations by 52.5% and Eliminates Fluff

OpenAI silently upgraded its default ChatGPT model to GPT-5.5 Instant, delivering self-correcting math reasoning, a 52.5% drop in hallucinations across medical and legal tests, 37.3% fewer user-marked errors, higher benchmark scores, shorter, fluff-free answers, and a new traceable memory feature, with a staged rollout to free and paid users.

AI model upgradeGPT-5.5OpenAI
0 likes · 4 min read
GPT-5.5 Instant Launch Cuts Hallucinations by 52.5% and Eliminates Fluff
PaperAgent
PaperAgent
Apr 29, 2026 · Artificial Intelligence

Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy

The article introduces the TRS (Thinking with Reasoning Skills) framework, which distills historical LLM reasoning traces into reusable skill cards, enabling offline skill‑base construction and online retrieval that dramatically reduces token consumption (6‑59%) and often improves accuracy on math and coding tasks.

Inference OptimizationLarge Language ModelsReasoning Skills
0 likes · 13 min read
Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy
ShiZhen AI
ShiZhen AI
Nov 28, 2025 · Artificial Intelligence

DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance

DeepSeekMath‑V2, released open‑source on 27 Nov 2025, attains gold‑level results on IMO 2025, scores 118 out of 120 on the Putnam 2024 competition, introduces a generator‑verifier self‑verification architecture, uses GRPO training, and outperforms leading closed‑source models on IMO‑ProofBench.

BenchmarkDeepSeekMath-V2GRPO
0 likes · 7 min read
DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance
Meituan Technology Team
Meituan Technology Team
Nov 27, 2025 · Artificial Intelligence

AMO‑Bench: A New High‑Difficulty, Original Math Reasoning Benchmark for LLMs

AMO‑Bench, released by Meituan's LongCat team, is a 50‑question, IMO‑level math reasoning benchmark that combines original, high‑difficulty problems with automated scoring, exposing the current limits of top large language models whose best accuracy hovers around 52 % and offering a more discriminative evaluation tool for future model improvements.

AI EvaluationAMO-BenchBenchmark
0 likes · 12 min read
AMO‑Bench: A New High‑Difficulty, Original Math Reasoning Benchmark for LLMs
HyperAI Super Neural
HyperAI Super Neural
Oct 21, 2025 · Artificial Intelligence

7 Essential Math Reasoning Datasets for AI: From Arithmetic to Visual Geometry

This article compiles seven prominent math reasoning datasets—including We‑Math2.0‑Standard, NuminaMath‑LEAN, T‑Wix, Nemotron‑Math‑HumanReasoning, Open‑Omega‑Atom‑1.5M, GSM8K, and VCBench—detailing their sizes, sources, associated papers, and unique features to support high‑quality AI research on mathematical problem solving.

AIBenchmarkDatasets
0 likes · 9 min read
7 Essential Math Reasoning Datasets for AI: From Arithmetic to Visual Geometry
Data Party THU
Data Party THU
Aug 23, 2025 · Artificial Intelligence

How MiroMind‑M1 Sets New Benchmarks in Open‑Source Math Reasoning

The article presents MiroMind‑M1, an open‑source math‑reasoning language model that combines a 719K high‑quality SFT dataset, a novel CAMPO reinforcement‑learning algorithm, and extensive evaluations on AIME24, AIME25, and MATH‑500, demonstrating state‑of‑the‑art performance while reducing token usage.

CAMPOEvaluationmath reasoning
0 likes · 11 min read
How MiroMind‑M1 Sets New Benchmarks in Open‑Source Math Reasoning
Kuaishou Large Model
Kuaishou Large Model
Aug 19, 2025 · Artificial Intelligence

How Klear-Reasoner Achieves SOTA Math & Code Reasoning with GPPO

Klear-Reasoner, built on Qwen3‑8B‑Base, introduces the Gradient‑Preserving Clipping Policy Optimization (GPPO) algorithm to overcome traditional clip limitations, achieving state‑of‑the‑art performance on AIME2024/2025 and LiveCodeBench while providing detailed experimental analysis and data‑quality insights.

GPPOLarge Language Modelscode reasoning
0 likes · 11 min read
How Klear-Reasoner Achieves SOTA Math & Code Reasoning with GPPO
Kuaishou Tech
Kuaishou Tech
Aug 18, 2025 · Artificial Intelligence

How Klear-Reasoner Achieves SOTA Math & Code Reasoning with GPPO Optimization

The Klear‑Reasoner model, built on Qwen3‑8B‑Base and powered by the novel Gradient‑Preserving Clipping Policy Optimization (GPPO) algorithm, surpasses same‑size open‑source baselines on challenging math (AIME) and code (LiveCodeBench) benchmarks, while revealing key insights on data quality, reward design, and clipping strategies for large‑language‑model reasoning.

GPPOLLMcode reasoning
0 likes · 11 min read
How Klear-Reasoner Achieves SOTA Math & Code Reasoning with GPPO Optimization
AI Frontier Lectures
AI Frontier Lectures
May 30, 2025 · Artificial Intelligence

Can Diffusion Chains Unlock More Creative Reasoning in Large Language Models?

Recent work from West Lake University's MAPLE Lab introduces a diffusion‑based “Divergent Thought Chain” that treats each intermediate denoising step of a diffusion language model as a reasoning step, using result‑based reinforcement learning to optimize non‑linear token generation and achieving state‑of‑the‑art performance on math and code tasks.

Chain-of-Thoughtcode-generationdiffusion language models
0 likes · 14 min read
Can Diffusion Chains Unlock More Creative Reasoning in Large Language Models?
Tencent Technical Engineering
Tencent Technical Engineering
May 23, 2025 · Artificial Intelligence

Can a 3B Open‑Source Multimodal Model Beat GPT‑4V in Math? A Deep Dive into VLR1‑3B

The preview release of the 3‑billion‑parameter VLR1‑3B multimodal model demonstrates state‑of‑the‑art reasoning on math benchmarks, outperforms many commercial closed‑source models, and shows promising results on geometry, physics, and general vision tasks, while also revealing typical hallucination issues.

BenchmarkOpen-sourceVLR1-3B
0 likes · 8 min read
Can a 3B Open‑Source Multimodal Model Beat GPT‑4V in Math? A Deep Dive into VLR1‑3B
AI Frontier Lectures
AI Frontier Lectures
Mar 20, 2025 · Artificial Intelligence

Why Multimodal LLMs Still Struggle with Multi-Image Math Reasoning: Insights from MV‑MATH

This article introduces the MV‑MATH dataset, a large‑scale multi‑image math benchmark, and evaluates 24 open‑source and closed‑source multimodal large language models, revealing significant performance gaps, especially on complex visual dependencies and higher difficulty levels.

DatasetLarge Language ModelsModel Evaluation
0 likes · 8 min read
Why Multimodal LLMs Still Struggle with Multi-Image Math Reasoning: Insights from MV‑MATH
NewBeeNLP
NewBeeNLP
Sep 2, 2024 · Artificial Intelligence

Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations

This article presents a comprehensive technical walkthrough on enhancing large language model mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A insights.

AILarge Language ModelsReward model
0 likes · 17 min read
Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations
DataFunTalk
DataFunTalk
Aug 24, 2024 · Artificial Intelligence

Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization

This article presents a comprehensive approach to enhancing large language models' mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A.

AILarge Language ModelsReward model
0 likes · 16 min read
Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 9, 2024 · Artificial Intelligence

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning

This article reviews recent step‑level DPO research, compares it with instance‑level DPO, explains the underlying Monte Carlo Tree Search formulation, and presents the author’s own replication experiments that demonstrate consistent performance gains across multiple LLM sizes on GSM8K and MATH benchmarks.

AI researchLLM alignmentMCTS
0 likes · 10 min read
Why Step-Level DPO Is Revolutionizing LLM Math Reasoning