Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 15, 2025 · Artificial Intelligence

Unlocking LLM Performance: Classic Deep RL Tricks Reimagined for Modern Training

This article systematically adapts classic deep reinforcement‑learning techniques—such as multi‑step returns, TD(λ)/GAE, V‑trace corrections, uncertainty‑aware weighting, safety constraints, distribution‑robust optimization, and value‑guided decoding—to improve large language model training and inference, providing concrete formulas, implementation tips, and empirical results.

Deep RLGAELLM
0 likes · 17 min read
Unlocking LLM Performance: Classic Deep RL Tricks Reimagined for Modern Training
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 20, 2025 · Artificial Intelligence

Unlocking Large‑Scale Deep Reinforcement Learning: PPO, GAE, and PPG Deep Dive

This comprehensive guide examines large‑scale deep reinforcement learning, detailing policy‑gradient fundamentals, the mathematics of PPO and GAE, practical implementation tricks, reward and observation normalization, network initialization, and the newer Phasic Policy Gradient method, all supported by code snippets and key research references.

Algorithm OptimizationDeep RLGAE
0 likes · 19 min read
Unlocking Large‑Scale Deep Reinforcement Learning: PPO, GAE, and PPG Deep Dive