Why Skipping the Thinking Step Makes Large Language Models More Accurate

UC Berkeley researchers found that forcing large language models to skip explicit reasoning—using a “NoThinking” mode—can achieve comparable or better accuracy with significantly fewer tokens, especially under token budget constraints, across math, coding, and theorem‑proving benchmarks.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Why Skipping the Thinking Step Makes Large Language Models More Accurate

Study Overview

UC Berkeley researchers compare explicit reasoning ("Thinking") with a no‑thinking approach ("NoThinking") in large language models.

Models

DeepSeek‑R1‑Distill‑Qwen‑32B (distilled from Qwen‑32B)

Baseline: Qwen‑32B‑Instruct

Evaluated on 7B and 14B scale variants

Datasets

Mathematics: AIME 2024, AIME 2025, AMC 2023, OlympiadBench

Programming: LiveCodeBench v2

Theorem proving: MiniF2F, ProofNet

Evaluation without token budget

Pass@k and token usage were measured. NoThinking matches Thinking on theorem‑proving tasks while using only ~30 % of the tokens, and the performance gap on other tasks narrows as k increases.

Token‑budget constrained experiments

Two budgets were tested: low (< 3000 tokens) and high (~ 3500 tokens). Under the low budget NoThinking consistently outperforms Thinking. Under the high budget Thinking has a slight advantage at k = 1, but NoThinking surpasses it from k = 2 onward, while also reducing latency.

Parallel‑extension tests

For tasks with perfect validators (formal theorem proving), NoThinking reduces latency to 1/7 and token consumption to 1/4 without sacrificing accuracy. For tasks without validators (e.g., AMC 2023, OlympiadBench), NoThinking even exceeds full Thinking performance and cuts latency to 1/9.

Data‑contamination check

The experiments were reproduced on the newly released AIME 2025 dataset, confirming that the observed patterns are not caused by data leakage.

Key implications

The results suggest that explicit chain‑of‑thought prompting may be unnecessary for optimal reasoning efficiency; skipping the reasoning step can yield substantial token and latency savings while maintaining or improving accuracy.

References

Paper: https://arxiv.org/abs/2504.09858

Related: https://www.anthropic.com/research/reasoning-models-dont-say-think

Discussion on Hacker News: https://news.ycombinator.com/item?id=43572374

Code example

收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reasoningToken EfficiencyNoThinking
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.