Tagged articles

Llama3.1

1 articles · Page 1 of 1

Jan 8, 2025 · Artificial Intelligence

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

This article compiles and analyzes the post‑training pipelines of Llama 3.1, DeepSeek‑V3, TÜLU 3 and Qwen 2.5, detailing their data compositions, SFT, reward modeling, DPO, GRPO, RLVR methods, hyper‑parameters, and practical tricks for large‑language‑model alignment.

DPODeepSeek-V3Llama3.1

0 likes · 22 min read

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

Llama3.1

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques