Kuaishou Large Model
Aug 19, 2025 · Artificial Intelligence
How Klear-Reasoner Achieves SOTA Math & Code Reasoning with GPPO
Klear-Reasoner, built on Qwen3‑8B‑Base, introduces the Gradient‑Preserving Clipping Policy Optimization (GPPO) algorithm to overcome traditional clip limitations, achieving state‑of‑the‑art performance on AIME2024/2025 and LiveCodeBench while providing detailed experimental analysis and data‑quality insights.
GPPOcode reasoninggradient clipping
0 likes · 11 min read
