Tagged articles
9 articles
Page 1 of 1
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

I²B‑LPO is an exploration‑enhancement framework for RLVR that branches rollouts at high‑entropy nodes, injects latent variables via pseudo self‑attention, and filters paths with an information‑bottleneck self‑reward, achieving up to 5.3% accuracy and 7.4% diversity improvements on multiple math reasoning benchmarks.

RLVRReinforcement Learningentropy
0 likes · 14 min read
Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOFine-tuning
0 likes · 17 min read
Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution
Design Hub
Design Hub
Dec 20, 2025 · Artificial Intelligence

Must-Read: K's 2025 AI Review – 6 Paradigm Shifts Reshaping Our World

The article reviews six 2025 paradigm shifts in large language models—from the rise of verifiable‑reward reinforcement learning and the emergence of AI "ghosts" to new "Cursor for X" middle layers, local agents like Claude Code, Vibe Coding that lets users program by conversation, and visual interaction driven by Gemini Nano Banana—highlighting their technical impact and design implications.

AI agentsLLMLocal AI
0 likes · 12 min read
Must-Read: K's 2025 AI Review – 6 Paradigm Shifts Reshaping Our World
PaperAgent
PaperAgent
Dec 20, 2025 · Industry Insights

What 2025 Tells Us About the Future of Large Language Models

The 2025 LLM year‑in‑review highlights paradigm shifts such as RLVR training, uneven “saw‑tooth” intelligence, the rise of Cursor‑style applications, Claude Code agents running locally, Vibe Coding, and the Nano Banana GUI revolution, concluding that current models only exploit about 10 % of their potential.

AI agentsLLMNano Banana
0 likes · 10 min read
What 2025 Tells Us About the Future of Large Language Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 11, 2025 · Artificial Intelligence

Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey

This article provides a detailed technical analysis of the Olmo‑Thinking project, covering why a new open‑source LLM was built, the challenges of reinforcement learning at scale, data‑mix optimization, architectural bottlenecks such as missing GQA and QK‑Norm, and the post‑training techniques used to improve reasoning and long‑context capabilities.

Open-source modelsRLVRdata selection
0 likes · 20 min read
Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey
Data Party THU
Data Party THU
Aug 19, 2025 · Artificial Intelligence

Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained

This article examines how reinforcement learning fine‑tuning influences large language model reasoning, revealing that RL primarily amplifies pre‑trained capabilities, suffers from entropy collapse, and fails to push the model’s reasoning boundary, supported by extensive experiments on scaling laws, entropy analysis, and mitigation techniques.

LLMRLRLVR
0 likes · 24 min read
Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained
DataFunTalk
DataFunTalk
Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchLarge Language ModelsRLVR
0 likes · 12 min read
Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study