How DeepSeek Trained a $30M LLM for Just $29.4K – Inside the R1 Model

The article reports that DeepSeek’s R1 large language model, detailed in a peer‑reviewed Nature paper, was built with roughly $300 k in total cost—about $29.4 k for training—using Nvidia H800 chips and novel pure reinforcement‑learning techniques, achieving competitive performance while remaining open‑source.

Data Party THU
Data Party THU
Data Party THU
How DeepSeek Trained a $30M LLM for Just $29.4K – Inside the R1 Model

Overview

DeepSeek’s R1 is a large language model (LLM) designed for reasoning tasks. It is the first mainstream LLM to be evaluated through a fully peer‑reviewed paper published in Nature (DOI: 10.1038/d41586-025-03015-6).

Cost and Funding

The total development cost was approximately $300,000 , of which the training phase accounted for about $294,000 (≈ 29.4 × 10⁴ USD). This is an order of magnitude lower than the tens of millions of dollars typically reported for comparable models.

Hardware and Training Infrastructure

Training was performed on Nvidia H800 GPUs. Despite export restrictions that limited the availability of H800 chips in China after 2023, DeepSeek completed training using the high‑throughput capabilities of these GPUs.

Training Methodology

R1 uses a pure reinforcement learning (RL) pipeline that does not rely on copying or imitating examples generated by other LLMs. The model learns through an automated trial‑and‑error process guided by a reward model, allowing it to develop its own inference strategies without human‑crafted reasoning templates.

Two key algorithmic innovations were introduced:

Group‑relative policy optimization : a self‑estimated scoring mechanism that evaluates the model’s outputs without an external algorithm.

Self‑evaluation : the model generates its own quality estimates for candidate responses, reducing dependence on external data.

Performance Evaluation

Independent benchmarks such as the ScienceAgentBench suite show that R1 achieves a strong balance of capability and cost. While it may not lead in raw accuracy on every metric, it outperforms many proprietary models in efficiency and overall utility.

Open‑Source Release

The model weights and associated code have been released publicly on Hugging Face, enabling community experimentation. The release includes a detailed cost breakdown, providing a case study for low‑budget LLM development.

Key Technical Details

Model size and architecture: standard transformer backbone optimized for reasoning tasks (exact parameters not disclosed).

Training data: the base model was pretrained on publicly available internet text; specific datasets for the RL stage were not disclosed.

Training budget: $30 × 10⁴ USD for the RL fine‑tuning stage, on top of an estimated $60 × 10⁴ USD pre‑training cost for the base model.

Hardware configuration: multiple Nvidia H800 GPUs (exact count not disclosed).

Reproducibility and Community Impact

The peer‑reviewed publication and open release of weights aim to improve transparency and reproducibility in LLM research. Researchers can replicate the training pipeline, evaluate safety and effectiveness, and explore extensions of the pure‑RL approach.

Code example

作者:Elizabeth Gibney
翻译:
赵茹萱
本文
约2500字
,建议阅读
8
分钟
首次同行评审的研究展示了这家中国初创公司如何花费
30
万美元制造出震撼市场的大型语言模型。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DeepSeeklarge language modelreinforcement learningNvidia H800Peer Reviewmodel training cost
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.