Amap Tech
May 19, 2025 · Artificial Intelligence
Group Policy Gradient: Direct Objective Optimization for Faster Reinforcement Learning
The article introduces Group Policy Gradient (GPG), a reinforcement‑learning framework that eliminates surrogate loss functions and critic models, directly optimizes the original objective, reduces bias and variance, and achieves state‑of‑the‑art performance on both single‑modal and multimodal tasks.
AI researchLLM fine-tuningbias reduction
0 likes · 7 min read
