Baobao Algorithm Notes
Jan 8, 2025 · Artificial Intelligence
Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques
This article compiles and analyzes the post‑training pipelines of Llama 3.1, DeepSeek‑V3, TÜLU 3 and Qwen 2.5, detailing their data compositions, SFT, reward modeling, DPO, GRPO, RLVR methods, hyper‑parameters, and practical tricks for large‑language‑model alignment.
DPODeepSeek-V3Llama3.1
0 likes · 22 min read
