Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM
This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.
Introduction
The author notes that DeepSeek‑R1 represents a major step in the open‑source LLM ecosystem, matching OpenAI’s o1 on many metrics. Rather than relying on hype‑driven posts, the author assembled a reading list that links to core research papers, allowing readers to study the model’s foundations one paper at a time.
Transformer Foundations
DeepSeek is built on the Transformer architecture. The author recommends starting with the seminal works that introduced the architecture and its scaling trends.
Title: Attention Is All You Need (Transformer) Link: https://arxiv.org/abs/1706.03762
Title: Language Models are Unsupervised Multitask Learners (GPT‑2) Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Title: Language Models are Few‑Shot Learners (GPT‑3) Link: https://arxiv.org/abs/2005.14165
Title: Training Language Models to Follow Instructions (InstructGPT) Link: https://arxiv.org/abs/2203.02155
Title: Llama‑3 Herd Of Models Link: https://arxiv.org/abs/2407.21783
These papers trace the evolution from the original Transformer to large‑scale LLMs and instruction‑tuned models.
Chain‑of‑Thought and Related Reasoning Papers
Both DeepSeek‑R1 and o1 rely on internal “thinking” tokens that enable multi‑step reasoning. The author lists key works on chain‑of‑thought prompting and its extensions.
Title: Chain‑of‑Thought Prompting Elicits Reasoning in Large Language Models Link: https://arxiv.org/abs/2201.11903
Title: Tree of Thoughts: Deliberate Problem Solving with Large Language Models Link: https://arxiv.org/abs/2305.10601
Title: Graph of Thoughts: Solving Elaborate Problems with Large Language Models Link: https://arxiv.org/abs/2308.09687
Title: Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation Link: https://arxiv.org/abs/2311.04254
Title: The Prompt Report Link: https://arxiv.org/abs/2406.06608
The author highlights how these techniques improve performance on arithmetic, commonsense, and symbolic reasoning tasks.
Mixture‑of‑Experts (MoE) Papers
DeepSeek‑V3 is described as a powerful MoE model with 671 B parameters, activating 370 B per token. The author cites early and recent MoE research that underpins this design.
Title: GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Link: https://arxiv.org/abs/2006.16668
Title: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Link: https://arxiv.org/abs/2101.03961
Title: A Review of Sparse Expert Models in Deep Learning Link: https://arxiv.org/abs/2209.01667
Title: Mixtral of Experts Link: https://arxiv.org/abs/2401.04088
Title: Upcycling Large Language Models into Mixture of Experts Link: https://arxiv.org/abs/2410.07524
These works explain how conditional computation and expert routing reduce training cost while preserving or improving model capability.
Reinforcement‑Learning (RL) Papers
The author notes that RL is crucial for turning a pretrained LLM into a useful chatbot with aligned behavior.
Title: RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Link: https://arxiv.org/pdf/2309.00267
Title: Self Rewarding Language Models Link: https://arxiv.org/abs/2401.10020
Title: Thinking LLMs: General Instruction Following with Thought Generation Link: https://arxiv.org/abs/2410.10630
Title: DPO - Direct Preference Optimization Link: https://arxiv.org/abs/2305.18290
The cited papers discuss removing human feedback loops, using LLMs as their own reward models, and introducing preference‑optimization methods that inform the training of DeepSeek‑R1.
DeepSeek Series Papers
The author finally lists the DeepSeek‑specific technical reports that detail the actual model pipelines.
Title: DeepSeekLLM: Scaling Open‑Source Language Models with Longer‑termism Link: https://arxiv.org/abs/2401.02954
Title: DeepSeek‑V2: A Strong, Economical, and Efficient Mixture‑of‑Experts Language Model Link: https://arxiv.org/abs/2405.04434
Title: DeepSeek‑V3 Technical Report Link: https://arxiv.org/abs/2412.19437v1
Title: DeepSeek‑R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Link: https://arxiv.org/abs/2501.12948
Title: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Link: https://arxiv.org/pdf/2402.03300
These reports describe the three‑stage pipeline (pre‑training, supervised fine‑tuning, DPO/RL) and introduce the GRPO algorithm used for mathematical reasoning.
Conclusion
By following the curated list, readers can trace the technical lineage from the original Transformer to the latest reasoning‑enhanced, MoE‑based LLMs, gaining a deep understanding of the research that makes DeepSeek‑R1 competitive with proprietary models like o1.
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
