Model-Free vs Model-Based RL: Core Concepts and Large-Model Applications
This article explains the fundamental architecture of reinforcement learning, contrasting model‑free and model‑based approaches, detailing environment models, planning, data augmentation, expert iteration, and embedding planning, and then examines how large language models use policy‑based methods such as PPO, DPO, and GRPO for RL‑HF.
