Tencent Technical Engineering
Feb 19, 2025 · Artificial Intelligence
Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments
This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.
DeepSeek-R1large language modellong chain of thought
0 likes · 39 min read