Tag

policy optimization

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Feb 24, 2025 · Artificial Intelligence

Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models

The article reviews reinforcement-learning fundamentals and the progression from policy-gradient to PPO, then introduces Group Relative Policy Optimization (GRPO)—a critic-free method that normalizes rewards across multiple sampled outputs to compute group-relative advantages—and shows how DeepSeek-R1 leverages GRPO with rule-based rewards to achieve strong reasoning performance.

GRPOPPORLHF
0 likes · 16 min read
Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models
DataFunSummit
DataFunSummit
Mar 25, 2021 · Artificial Intelligence

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects

Reinforcement learning, a branch of artificial intelligence, is explained through its core concepts, successful case studies such as AlphaGo and AlphaStar, practical application workflows, current challenges, resources, and future outlook, offering a comprehensive guide for researchers and practitioners.

Deep Learningapplicationsartificial intelligence
0 likes · 56 min read
An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects