Alibaba Cloud Developer
Jul 26, 2018 · Artificial Intelligence
How We Won OpenAI’s Retro Contest: Joint PPO and Generalization in Sonic
This article details the technical journey behind Alibaba’s champion solution in OpenAI’s Retro Contest, explaining the reinforcement‑learning challenges of playing Sonic, the joint PPO approach, distributed training optimizations, reward shaping, fine‑tuning with DeepMimic, and the final performance that secured first place.
GeneralizationOpenAI Retro Contestjoint PPO
0 likes · 20 min read
