Why Large Language Models Need Not Run CoT on Every Question: Tencent Hunyuan’s On‑Demand CoT Trigger

The paper analyzes the efficiency and reward‑signal shortcomings of conventional generative reward models (GRM) and presents the E‑GRM framework, which uses model‑internal uncertainty to dynamically trigger chain‑of‑thought reasoning, employs a consensus‑based routing decision and a mixed‑loss discriminative scorer, achieving significant speed‑up and accuracy gains on benchmarks such as MATH, RM‑Bench and RewardBench.

Chain-of-ThoughtDynamic RoutingEfficiency

0 likes · 15 min read

Why Large Language Models Need Not Run CoT on Every Question: Tencent Hunyuan’s On‑Demand CoT Trigger