Artificial Intelligence 10 min read

Analyzing DeepSeek R1 Inference Projects: Source Code, Cold‑Start, and Scaling Techniques

This article examines DeepSeek R1’s three breakthroughs, its low‑cost optimizations that bypass CUDA, and the resulting impact on the AI ecosystem, then provides a detailed technical review of seven open‑source reproductions—Open‑R1, Tiny‑Zero, SimpleScaling‑S1, and simpleRL‑reason—covering their architectures, reinforcement‑learning pipelines, and code implementations.

AI2ML AI to Machine Learning

Feb 8, 2025

Analyzing DeepSeek R1 Inference Projects: Source Code, Cold‑Start, and Scaling Techniques

Introduction

DeepSeek positions itself as a low‑cost AI solution, claiming three major breakthroughs: a two‑track training corpus (open‑source text/code/Q&A and carefully crafted multi‑turn reasoning data), a novel alignment method called DS‑R1‑Zero that uses GRPO reinforcement learning, and aggressive cost reductions despite using reinforcement‑learning‑derived data.

Low‑Cost Optimizations and PTX Usage

Although reinforcement learning typically raises training costs, DeepSeek achieves lower expenses through a series of optimizations (see the linked “DS十三优” article). Notably, it replaces CUDA with direct PTX (GPU assembly) kernels, arguing that CUDA is not optimal for large‑model training and that bypassing it can yield better performance and reduce reliance on NVIDIA’s ecosystem.

Projected Impacts

Potentially undermines OpenAI’s high‑cost, high‑performance barrier and questions the necessity of its high valuation.

Open‑source availability and low data‑labeling costs enable small companies to adopt the technology, fostering new B2B opportunities.

Hardware vendors may invest in PTX‑level support for large‑model workloads, weakening CUDA’s dominance.

DS‑R1‑Zero Reproductions Overview

Seven projects claim to reproduce DeepSeek’s methodology: open‑r1, simplescaling‑s1, R1‑V, simpleRL, TinyZero, RAGEN, and Logic‑RL. The analysis focuses on four core implementations: Open‑R1, Tiny‑Zero, SimpleScaling‑S1, and simpleRL‑reason.

1. Open‑R1

Open‑R1 replicates the R1 model rather than the cold‑start R1‑Zero. It leverages existing large‑model RL frameworks such as TRL, OpenRLHF, veRL, and Open‑Instruct/Tutu, with TRL (from HuggingFace) and veRL (from ByteDance) being the most used.

The repository’s key scripts are grpo.py, reward.py, and sft.py. These files are invoked independently and support direct model upload to the HuggingFace hub.

The data‑generation step uses generate.py, which distills an online R1 model via the distilabel framework (originating from argilla‑io/distilabel) to produce chain‑of‑thought (CoT) labeled data. Reward design in grpo.py registers three simple rules—accuracy, logical format, and reasoning steps—implemented with regular expressions, leaving room for improvement.

2. Tiny‑Zero

Tiny‑Zero, authored by a Chinese‑American researcher, is a faithful reproduction of R1‑Zero. Its core demonstrates veRL capabilities through three test scripts: test_r1_dataset.py, test_rm_dataset.py, and test_sft_dataset.py.

The project provides a complete example pipeline: data generation, RL training, and SFT. It uses a RewardManager to align datasets with veRL’s scoring mechanisms, and includes a Ray‑based execution framework.

3. SimpleScaling‑S1

SimpleScaling‑S1, from the Fei‑Fei Li group, focuses on inference scaling under limited sample regimes. While the training remains standard SFT, the novelty lies in a “budget forcing” interface that controls both training and inference costs.

The system incorporates a tree‑search‑based manager and defines a “Budget” tree structure to allocate resources during inference scaling.

4. simpleRL‑reason

simpleRL‑reason extends OpenRLHF to implement the RL component of DS‑R1, aiming to improve reasoning capabilities. It first defines logical and textual transformation abilities, then introduces a custom scorer. The pipeline proceeds with OpenRLHF’s conversion handling, reward training, and PPO optimization.

Conclusion

DeepSeek‑R1, especially the DS‑R1‑Zero cold‑start methodology, expands the practical use of large models. Its influence appears across the four examined projects: Open‑R1 simplifies R1 distillation, Tiny‑Zero provides a concrete cold‑start pipeline, SimpleScaling‑S1 introduces cost‑aware inference scaling, and simpleRL‑reason demonstrates logic‑driven reasoning improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models open source DeepSeek Reinforcement Learning R1 Inference Scaling PTX

Written by

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.