Artificial Intelligence 14 min read

Early‑Stopping Self‑Consistency (ESC): Reducing Sampling Cost for Large Language Model Reasoning

Early‑Stopping Self‑Consistency (ESC) dynamically halts sampling once a sliding‑window answer distribution reaches zero entropy, cutting the number of required LLM reasoning samples by 34‑84 % across arithmetic, commonsense, and symbolic benchmarks while preserving accuracy and offering a theoretically‑bounded, robust, budget‑adaptive alternative to traditional Self‑Consistency.

Xiaohongshu Tech REDtech

Apr 10, 2024

Early‑Stopping Self‑Consistency (ESC): Reducing Sampling Cost for Large Language Model Reasoning

Large language models (LLMs) achieve strong reasoning abilities when guided by Chain‑of‑Thought (CoT) prompts, which simulate step‑by‑step human thinking. Self‑Consistency (SC) is a widely used decoding strategy that generates multiple reasoning paths and selects the majority answer, greatly improving performance on multi‑step tasks but incurring high sampling costs.

At ICLR 2024, the Xiaohongshu search algorithm team introduced Early‑Stopping Self‑Consistency (ESC), a simple and scalable sampling process that dramatically lowers SC’s cost without sacrificing accuracy. ESC dynamically stops sampling when the answer distribution within a sliding window has zero entropy (i.e., all samples agree), thereby truncating the decoding process.

Experiments were conducted on three representative reasoning tasks—arithmetic (MATH, GSM8K), commonsense (CommonsenseQA, StrategyQA), and symbolic (Last Letter Concatenation, Coin Flip)—using GPT‑4, GPT‑3.5‑Turbo, and LLaMA‑2 7B in a few‑shot setting. ESC reduced the average number of samples by 33.8%–84.2% across the six benchmarks while maintaining performance comparable to full SC.

The authors provide a theoretical analysis showing that the probability of inconsistency between ESC and SC is bounded by a negligible value (e.g., <0.002 when the window size is 8). A dynamic control scheme is derived to select optimal window sizes and maximum sampling numbers for different tasks and models, achieving a desirable performance‑cost trade‑off without any model‑specific tuning.

Robustness studies demonstrate that ESC is stable across varying sampling budgets, temperature settings, top‑k values, and even in zero‑shot scenarios. Additional experiments on open‑domain generation (MBPP) confirm that ESC extends to tasks without fixed answers.

Overall, ESC offers a cost‑effective alternative to traditional SC, enabling large‑scale LLM inference with substantially fewer samples while preserving accuracy, and its dynamic control mechanism adapts to diverse budget and performance requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Chain-of-Thought Early Stopping Inference sampling efficiency

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.