Tagged articles
7 articles
Page 1 of 1
Machine Heart
Machine Heart
May 16, 2026 · Artificial Intelligence

GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning

GIPO (Gaussian Importance Sampling Policy Optimization) replaces PPO’s hard clipping with a smooth Gaussian‑weighted trust region, achieving log‑space symmetry and bias‑variance balance that mitigates policy lag and utilization collapse, and demonstrates superior stability and sample efficiency on GridWorld, LIBERO, MetaWorld, and 7‑billion‑parameter VLA experiments.

Bias-Variance TradeoffGIPOLarge-Scale Training
0 likes · 17 min read
GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning
Machine Heart
Machine Heart
Apr 2, 2026 · Artificial Intelligence

Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer

The paper revisits cross-domain offline reinforcement learning, showing that aligning both dynamics and value of source data is essential for effective policy transfer, and introduces the DVDF framework that jointly filters source samples, achieving consistent performance gains across multiple robotic control benchmarks.

DVDFPolicy Optimizationcross-domain transfer
0 likes · 13 min read
Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 24, 2026 · Artificial Intelligence

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

After DPO, the typical research trajectory moves through GRPO, DAPO, GSPO, and SAPO, each introducing new optimization objectives, sampling strategies, and reward‑shaping techniques that aim to reduce memory usage, improve gradient stability, and enhance the efficiency of large‑model reinforcement learning.

DAPOGRPOGSPO
0 likes · 6 min read
What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?
Data Party THU
Data Party THU
Nov 24, 2025 · Artificial Intelligence

Model-Free vs Model-Based RL: Core Concepts and Large-Model Applications

This article explains the fundamental architecture of reinforcement learning, contrasting model‑free and model‑based approaches, detailing environment models, planning, data augmentation, expert iteration, and embedding planning, and then examines how large language models use policy‑based methods such as PPO, DPO, and GRPO for RL‑HF.

Model-BasedModel-freePlanning
0 likes · 13 min read
Model-Free vs Model-Based RL: Core Concepts and Large-Model Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 13, 2025 · Artificial Intelligence

How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization

The article introduces GVPO, a Group Variance Policy Optimization method that uniquely achieves KL‑constrained reward maximization, supports diverse sampling distributions, and resolves instability and inefficiency issues found in GRPO and traditional policy‑gradient approaches for large language model post‑training.

GVPOKL constraintPolicy Optimization
0 likes · 9 min read
How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization
Tencent Technical Engineering
Tencent Technical Engineering
Feb 24, 2025 · Artificial Intelligence

Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models

The article reviews reinforcement-learning fundamentals and the progression from policy-gradient to PPO, then introduces Group Relative Policy Optimization (GRPO)—a critic-free method that normalizes rewards across multiple sampled outputs to compute group-relative advantages—and shows how DeepSeek-R1 leverages GRPO with rule-based rewards to achieve strong reasoning performance.

GRPOPPOPolicy Optimization
0 likes · 16 min read
Understanding GRPO: Group Relative Policy Optimization in Reinforcement Learning and Large Language Models
DataFunSummit
DataFunSummit
Mar 25, 2021 · Artificial Intelligence

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects

Reinforcement learning, a branch of artificial intelligence, is explained through its core concepts, successful case studies such as AlphaGo and AlphaStar, practical application workflows, current challenges, resources, and future outlook, offering a comprehensive guide for researchers and practitioners.

ApplicationsPolicy Optimizationartificial intelligence
0 likes · 56 min read
An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects