Tag

reward function

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Mar 31, 2025 · Artificial Intelligence

Step-by-Step Guide to Local Training of DeepSeek R1 on Multi‑GPU A100 Systems

This step‑by‑step tutorial shows how to set up CUDA 12.4, install required packages, prepare a JSON dataset and custom reward, troubleshoot out‑of‑memory errors, and launch DeepSeek R1 training on an 8‑GPU A100 cluster using Accelerate, Deepspeed zero‑3 and vLLM configurations.

A100CUDADeepSeek
0 likes · 9 min read
Step-by-Step Guide to Local Training of DeepSeek R1 on Multi‑GPU A100 Systems
Tencent Technical Engineering
Tencent Technical Engineering
Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1large language modellong chain of thought
0 likes · 39 min read
Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments