Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy

This paper introduces the Thought Leap Bridge task and the CoT‑Bridge model, which detect and fill missing intermediate steps in chain‑of‑thought reasoning, dramatically improving large language model performance on mathematical and logical benchmarks and enhancing downstream distillation and reinforcement‑learning pipelines.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy

Background

Chain‑of‑Thought (CoT) prompting enables large language models (LLMs) to perform step‑by‑step reasoning on structured tasks such as mathematics and logic. Many publicly available CoT datasets contain "Thought Leaps", i.e., omitted intermediate reasoning steps that are obvious to humans but create gaps for models.

Thought‑Leap Problem

A Thought Leap occurs when a reasoning chain skips essential transitions (e.g., missing the derivation of the number 15 or the justification for applying the pigeonhole principle). Experiments show that severe Thought Leaps can cause up to a 27.83 % drop in performance and slower convergence during training.

CoT‑Bridge Task

The authors define a Thought‑Leap Bridge task with two sub‑problems: (1) Leap detection – identify whether adjacent steps contain a logical jump; (2) Step completion – generate the missing intermediate reasoning to restore a coherent chain.

Dataset Construction (ScaleQM+)

ScaleQM+ is built from the high‑quality ScaleQuestMath dataset. For each example, a random subset of intermediate steps is deliberately removed, producing an incomplete reasoning chain paired with the removed steps. This creates supervised training pairs that teach a model to recognize incoherent structures and to generate appropriate completions.

Model

CoT‑Bridge is an instruction‑tuned model based on Qwen2.5‑Math‑7B. It receives an incomplete CoT as input and outputs the missing steps, effectively bridging the Thought Leap.

Experimental Results

Supervised fine‑tuning (SFT) on the completed ScaleQM+ data improves math benchmarks MetaMathQA and NuminaMath by up to +5.87 %.

When applied to knowledge‑distillation data generated by a 72 B LLM, CoT‑Bridge adds +3.02 % accuracy.

In reinforcement‑learning pipelines, using bridge‑enhanced data as a cold‑start yields ≈ +3.1 % final accuracy on NuminaMath.

Out‑of‑distribution logical reasoning benchmarks (FOLIO, LogicQA, ProofWriter, ReClor, RuleTaker) see average gains of +2.99 % for Meta‑Llama3.1‑8B and +0.99 % for Qwen2.5‑Math‑1.5B, together with a reduction in invalid outputs.

Resources

Paper: https://arxiv.org/abs/2505.14684

Project page: https://zju-real.github.io/CoT-Bridge/

Code repository: https://github.com/ZJU-REAL/Mind-the-Gap

Code example

收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!
LLMmodel fine-tuningreasoningMathematical ReasoningChain-of-ThoughtCoT-BridgeThought Leap
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.