Topic

DeepSeek

Collection size
2 articles
Page 1 of 1
Architect
Architect
Feb 25, 2025 · Artificial Intelligence

DeepSeek R1: Multi‑Stage Reinforcement Learning, Reward Modeling, and Distillation for a High‑Performance LLM

DeepSeek R1 builds on the DeepSeek V3 base model using a multi‑stage reinforcement learning pipeline—including GRPO optimization, rule‑based reward modeling, supervised fine‑tuning, language‑consistency rewards, rejection sampling, and distillation—to produce a high‑performing, aligned LLM capable of accurate reasoning.

DeepSeekLLM TrainingModel Distillation
0 likes · 24 min read
DeepSeek R1: Multi‑Stage Reinforcement Learning, Reward Modeling, and Distillation for a High‑Performance LLM
DataFunTalk
DataFunTalk
Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekModel DistillationSupervised Fine-tuning
0 likes · 16 min read
Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies