Tagged articles

Policy Optimization

7 articles · Page 1 of 1

May 16, 2026 · Artificial Intelligence

GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning

GIPO (Gaussian Importance Sampling Policy Optimization) replaces PPO’s hard clipping with a smooth Gaussian‑weighted trust region, achieving log‑space symmetry and bias‑variance balance that mitigates policy lag and utilization collapse, and demonstrates superior stability and sample efficiency on GridWorld, LIBERO, MetaWorld, and 7‑billion‑parameter VLA experiments.

Bias-Variance TradeoffGIPOLarge‑Scale Training

0 likes · 17 min read

GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning

Machine Heart

Apr 2, 2026 · Artificial Intelligence

Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer

The paper revisits cross-domain offline reinforcement learning, showing that aligning both dynamics and value of source data is essential for effective policy transfer, and introduces the DVDF framework that jointly filters source samples, achieving consistent performance gains across multiple robotic control benchmarks.

DVDFPolicy Optimizationcross-domain transfer

0 likes · 13 min read

Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer

Baobao Algorithm Notes

Jan 24, 2026 · Artificial Intelligence

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

After DPO, the typical research trajectory moves through GRPO, DAPO, GSPO, and SAPO, each introducing new optimization objectives, sampling strategies, and reward‑shaping techniques that aim to reduce memory usage, improve gradient stability, and enhance the efficiency of large‑model reinforcement learning.

DAPOGRPOGSPO

0 likes · 6 min read

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

Data Party THU

Nov 24, 2025 · Artificial Intelligence

Model-Free vs Model-Based RL: Core Concepts and Large-Model Applications

This article explains the fundamental architecture of reinforcement learning, contrasting model‑free and model‑based approaches, detailing environment models, planning, data augmentation, expert iteration, and embedding planning, and then examines how large language models use policy‑based methods such as PPO, DPO, and GRPO for RL‑HF.

Model-BasedModel-freePlanning

0 likes · 13 min read

Model-Free vs Model-Based RL: Core Concepts and Large-Model Applications

Baobao Algorithm Notes

Jun 13, 2025 · Artificial Intelligence

How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization

The article introduces GVPO, a Group Variance Policy Optimization method that uniquely achieves KL‑constrained reward maximization, supports diverse sampling distributions, and resolves instability and inefficiency issues found in GRPO and traditional policy‑gradient approaches for large language model post‑training.

GVPOKL constraintPolicy Optimization

0 likes · 9 min read

How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization

Tencent Technical Engineering

Feb 24, 2025 · Artificial Intelligence

Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models

The article reviews reinforcement-learning fundamentals and the progression from policy-gradient to PPO, then introduces Group Relative Policy Optimization (GRPO)—a critic-free method that normalizes rewards across multiple sampled outputs to compute group-relative advantages—and shows how DeepSeek-R1 leverages GRPO with rule-based rewards to achieve strong reasoning performance.

GRPOPPOPolicy Optimization

0 likes · 16 min read

Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models

DataFunSummit

Mar 25, 2021 · Artificial Intelligence

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects

Reinforcement learning, a branch of artificial intelligence, is explained through its core concepts, successful case studies such as AlphaGo and AlphaStar, practical application workflows, current challenges, resources, and future outlook, offering a comprehensive guide for researchers and practitioners.

ApplicationsArtificial IntelligencePolicy Optimization

0 likes · 56 min read

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects