Tag

inference scaling

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekModel DistillationSupervised Fine-tuning
0 likes · 16 min read
Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies