Data Party THU
Data Party THU
Oct 20, 2025 · Artificial Intelligence

How Agentic RL Enables a 14B LLM to Outperform Giant Models – Inside rStar2‑Agent

This article analyzes the rStar2‑Agent paper, revealing how Agentic Reinforcement Learning, the GRPO‑RoC algorithm, a high‑throughput code‑execution service, and a three‑stage training recipe let a modest 14‑billion‑parameter model surpass much larger LLMs on challenging math benchmarks.

AI researchAgentic Reinforcement LearningArtificial Intelligence
0 likes · 18 min read
How Agentic RL Enables a 14B LLM to Outperform Giant Models – Inside rStar2‑Agent
DataFunTalk
DataFunTalk
Sep 18, 2025 · Artificial Intelligence

How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse

Tongyi DeepResearch, an open‑source AI model and framework, achieves SOTA on multiple Deep Research benchmarks by combining fully open‑source models, frameworks, and data pipelines, and introduces novel agentic pre‑training, fine‑tuning, and reinforcement‑learning methods to enable complex multi‑step reasoning and real‑world applications.

AI researchAgentic Reinforcement LearningOpen-source
0 likes · 14 min read
How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse