Tagged articles
2 articles
Page 1 of 1
Tencent Technical Engineering
Tencent Technical Engineering
Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM Reasoning
0 likes · 21 min read
DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning
21CTO
21CTO
Jan 31, 2025 · Artificial Intelligence

How DeepSeek‑R1 Is Redefining Open‑Source AI and Challenging OpenAI’s O1

DeepSeek‑R1, an open‑source inference model released under the MIT license, matches or surpasses OpenAI’s O1 on math, coding, and reasoning benchmarks, offers multiple scaled versions, runs at lightning speed, and is rapidly adopted worldwide, signaling a shift toward more accessible, high‑performance AI.

BenchmarkDeepSeek-R1large language model
0 likes · 9 min read
How DeepSeek‑R1 Is Redefining Open‑Source AI and Challenging OpenAI’s O1