Can Multi‑Round Thinking Boost LLM Accuracy Without Extra Training?
A new study from the a‑m‑team introduces “Think Twice”, a test‑time multi‑round reasoning technique that, without additional training or model changes, repeatedly prompts large language models to self‑correct, yielding notable accuracy gains across benchmarks such as AIME, MATH‑500, GPQA‑Diamond and LiveCodeBench, while also producing shorter, more confident answers.
Background
Think Twice is a test‑time inference strategy that improves reasoning of large language models (LLMs) without any additional training or architectural changes.
Method: Multi‑round test‑time thinking
The model first generates an answer to a question. That answer is then used as a new prompt for a second (or subsequent) generation. Each round only receives the previous answer as context, allowing the model to “re‑answer” independently and correct earlier mistakes. This result‑driven self‑correction mitigates “cognitive inertia” where the model sticks to an initial reasoning path.
Evaluation datasets
AIME 2024 (American Invitational Mathematics Examination)
MATH‑500 (500 hardest problems from the MATH dataset)
GPQA‑Diamond (graduate‑level question answering)
LiveCodeBench (programming tasks)
Results
Across four benchmarks, several state‑of‑the‑art models show consistent accuracy gains when using 2‑4 thinking rounds.
DeepSeek‑R1 on AIME: 79.7 % → 82.0 %
QwQ‑32B on AIME: 80.3 % → 83.1 %
Additional rounds further increase accuracy, indicating improved stability and reflective capability.
Language style analysis
Frequency analysis of discourse markers shows a reduction of uncertainty words (“but”, “maybe”, “wait”) and an increase of transitional terms (“therefore”) in later rounds, especially when the model corrects an error. Answers become shorter, more confident, and more logically structured.
Practical advantages
The technique operates entirely at inference time, requiring no extra training resources and can be applied as a plug‑and‑play wrapper to deployed models. The authors also explored using multi‑round outputs as supervision for further fine‑tuning; early experiments show modest improvements, suggesting a path toward combined training‑and‑inference reflection.
Conclusion
Think Twice demonstrates that a simple multi‑round reflection loop can substantially boost LLM accuracy and produce more concise, confident answers without any model modification. It offers an immediate, lightweight optimization for deployed systems and opens research directions for integrated multi‑round reasoning mechanisms.
Paper: https://arxiv.org/abs/2503.19855
Code repository: https://github.com/a-m-team/a-m-models
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
