Tag

open‑source replication

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1large language modellong chain of thought
0 likes · 39 min read
Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments