Artificial Intelligence 15 min read

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.

AI Algorithm Path

Feb 12, 2025

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

Introduction

The author notes that DeepSeek‑R1 represents a major step in the open‑source LLM ecosystem, matching OpenAI’s o1 on many metrics. Rather than relying on hype‑driven posts, the author assembled a reading list that links to core research papers, allowing readers to study the model’s foundations one paper at a time.

Transformer Foundations

DeepSeek is built on the Transformer architecture. The author recommends starting with the seminal works that introduced the architecture and its scaling trends.

Title: Attention Is All You Need (Transformer) Link: https://arxiv.org/abs/1706.03762

Title: Language Models are Unsupervised Multitask Learners (GPT‑2) Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Title: Language Models are Few‑Shot Learners (GPT‑3) Link: https://arxiv.org/abs/2005.14165

Title: Training Language Models to Follow Instructions (InstructGPT) Link: https://arxiv.org/abs/2203.02155

Title: Llama‑3 Herd Of Models Link: https://arxiv.org/abs/2407.21783

These papers trace the evolution from the original Transformer to large‑scale LLMs and instruction‑tuned models.

Chain‑of‑Thought and Related Reasoning Papers

Both DeepSeek‑R1 and o1 rely on internal “thinking” tokens that enable multi‑step reasoning. The author lists key works on chain‑of‑thought prompting and its extensions.

Title: Chain‑of‑Thought Prompting Elicits Reasoning in Large Language Models Link: https://arxiv.org/abs/2201.11903

Title: Tree of Thoughts: Deliberate Problem Solving with Large Language Models Link: https://arxiv.org/abs/2305.10601

Title: Graph of Thoughts: Solving Elaborate Problems with Large Language Models Link: https://arxiv.org/abs/2308.09687

Title: Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation Link: https://arxiv.org/abs/2311.04254

Title: The Prompt Report Link: https://arxiv.org/abs/2406.06608

The author highlights how these techniques improve performance on arithmetic, commonsense, and symbolic reasoning tasks.

Mixture‑of‑Experts (MoE) Papers

DeepSeek‑V3 is described as a powerful MoE model with 671 B parameters, activating 370 B per token. The author cites early and recent MoE research that underpins this design.

Title: GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Link: https://arxiv.org/abs/2006.16668

Title: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Link: https://arxiv.org/abs/2101.03961

Title: A Review of Sparse Expert Models in Deep Learning Link: https://arxiv.org/abs/2209.01667

Title: Mixtral of Experts Link: https://arxiv.org/abs/2401.04088

Title: Upcycling Large Language Models into Mixture of Experts Link: https://arxiv.org/abs/2410.07524

These works explain how conditional computation and expert routing reduce training cost while preserving or improving model capability.

Reinforcement‑Learning (RL) Papers

The author notes that RL is crucial for turning a pretrained LLM into a useful chatbot with aligned behavior.

Title: RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Link: https://arxiv.org/pdf/2309.00267

Title: Self Rewarding Language Models Link: https://arxiv.org/abs/2401.10020

Title: Thinking LLMs: General Instruction Following with Thought Generation Link: https://arxiv.org/abs/2410.10630

Title: DPO - Direct Preference Optimization Link: https://arxiv.org/abs/2305.18290

The cited papers discuss removing human feedback loops, using LLMs as their own reward models, and introducing preference‑optimization methods that inform the training of DeepSeek‑R1.

DeepSeek Series Papers

The author finally lists the DeepSeek‑specific technical reports that detail the actual model pipelines.

Title: DeepSeekLLM: Scaling Open‑Source Language Models with Longer‑termism Link: https://arxiv.org/abs/2401.02954

Title: DeepSeek‑V2: A Strong, Economical, and Efficient Mixture‑of‑Experts Language Model Link: https://arxiv.org/abs/2405.04434

Title: DeepSeek‑V3 Technical Report Link: https://arxiv.org/abs/2412.19437v1

Title: DeepSeek‑R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Link: https://arxiv.org/abs/2501.12948

Title: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Link: https://arxiv.org/pdf/2402.03300

These reports describe the three‑stage pipeline (pre‑training, supervised fine‑tuning, DPO/RL) and introduce the GRPO algorithm used for mathematical reasoning.

Conclusion

By following the curated list, readers can trace the technical lineage from the original Transformer to the latest reasoning‑enhanced, MoE‑based LLMs, gaining a deep understanding of the research that makes DeepSeek‑R1 competitive with proprietary models like o1.

Transformer Mixture of Experts DeepSeek Large Language Model chain of thought reinforcement learning Research Papers

Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.