OptScale: Probabilistic Optimal Stopping for Inference‑Time Scaling
OptScale introduces a probabilistic framework that determines the optimal number of inference samples needed to meet a target accuracy with a confidence guarantee, dramatically reducing token usage while maintaining or improving performance across various large‑language‑model benchmarks.
