Bridging Thought Leaps: How CoT‑Bridge Boosts LLM Reasoning Accuracy
This paper introduces the Thought Leap Bridge task and the CoT‑Bridge model, which detect and fill missing intermediate steps in chain‑of‑thought reasoning, dramatically improving large language model performance on mathematical and logical benchmarks and enhancing downstream distillation and reinforcement‑learning pipelines.
Background
Chain‑of‑Thought (CoT) prompting enables large language models (LLMs) to perform step‑by‑step reasoning on structured tasks such as mathematics and logic. Many publicly available CoT datasets contain "Thought Leaps", i.e., omitted intermediate reasoning steps that are obvious to humans but create gaps for models.
Thought‑Leap Problem
A Thought Leap occurs when a reasoning chain skips essential transitions (e.g., missing the derivation of the number 15 or the justification for applying the pigeonhole principle). Experiments show that severe Thought Leaps can cause up to a 27.83 % drop in performance and slower convergence during training.
CoT‑Bridge Task
The authors define a Thought‑Leap Bridge task with two sub‑problems: (1) Leap detection – identify whether adjacent steps contain a logical jump; (2) Step completion – generate the missing intermediate reasoning to restore a coherent chain.
Dataset Construction (ScaleQM+)
ScaleQM+ is built from the high‑quality ScaleQuestMath dataset. For each example, a random subset of intermediate steps is deliberately removed, producing an incomplete reasoning chain paired with the removed steps. This creates supervised training pairs that teach a model to recognize incoherent structures and to generate appropriate completions.
Model
CoT‑Bridge is an instruction‑tuned model based on Qwen2.5‑Math‑7B. It receives an incomplete CoT as input and outputs the missing steps, effectively bridging the Thought Leap.
Experimental Results
Supervised fine‑tuning (SFT) on the completed ScaleQM+ data improves math benchmarks MetaMathQA and NuminaMath by up to +5.87 %.
When applied to knowledge‑distillation data generated by a 72 B LLM, CoT‑Bridge adds +3.02 % accuracy.
In reinforcement‑learning pipelines, using bridge‑enhanced data as a cold‑start yields ≈ +3.1 % final accuracy on NuminaMath.
Out‑of‑distribution logical reasoning benchmarks (FOLIO, LogicQA, ProofWriter, ReClor, RuleTaker) see average gains of +2.99 % for Meta‑Llama3.1‑8B and +0.99 % for Qwen2.5‑Math‑1.5B, together with a reduction in invalid outputs.
Resources
Paper: https://arxiv.org/abs/2505.14684
Project page: https://zju-real.github.io/CoT-Bridge/
Code repository: https://github.com/ZJU-REAL/Mind-the-Gap
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
