Tagged articles
1 articles
Page 1 of 1
PaperAgent
PaperAgent
Dec 29, 2025 · Artificial Intelligence

Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies

This article introduces Bottom‑up Policy Optimization (BuPO), a novel reinforcement‑learning framework that treats large language models as collections of internal layer and modular policies, revealing distinct inference entropy patterns in Llama and Qwen‑3 and demonstrating superior performance on complex mathematical reasoning benchmarks.

AI researchBottom-up OptimizationInternal Policy
0 likes · 10 min read
Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies