Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

291

Articles

Likes

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

Dec 25, 2025 · Artificial Intelligence

TeleChat3-105B: China’s First 100B‑Scale MoE Model and Its Technical Breakthroughs

The article analyzes TeleChat3-105B-A4.7-Thinking, the first domestically built 100‑billion‑parameter Mixture‑of‑Experts model, detailing its multi‑dimensional evaluation, three‑stage training pipeline, hardware‑level optimizations, fine‑grained architecture, and its significance for the evolving AI competition landscape.

AI trainingChinese AIMixture of Experts

0 likes · 6 min read

TeleChat3-105B: China’s First 100B‑Scale MoE Model and Its Technical Breakthroughs

Baobao Algorithm Notes

Dec 24, 2025 · Artificial Intelligence

GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning

The GLM-4.7 model launches with record‑breaking benchmark scores across coding, reasoning, and real‑world programming tasks, outperforming both open‑source and commercial LLMs while introducing advanced interleaved, retained, and round‑level thinking modes that enhance complex task execution.

AI model comparisonCoding AIGLM-4.7

0 likes · 9 min read

GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning

Baobao Algorithm Notes

Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Agentic RLAsynchronous InferenceRL Systems

0 likes · 18 min read

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

Baobao Algorithm Notes

Dec 20, 2025 · Artificial Intelligence

How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs

The article analyzes the 2025 shift toward a unified "general‑type" agent architecture exemplified by Claude Code and Deep Agent, detailing industry adoption, core technical features, skill‑based extensions, long‑running capabilities, and practical steps for building domain‑specific agents.

AI ArchitectureAgent SkillsClaude Code

0 likes · 25 min read

How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs

Baobao Algorithm Notes

Dec 7, 2025 · Artificial Intelligence

Key Lessons from Scaling Agent RL Training: Stability, Tooling, and Reward Design

Over recent months of extensive agent reinforcement‑learning experiments across search, data‑analysis, and multi‑source scenarios, the author shares twelve practical insights covering stability, environment‑reward‑algorithm priorities, tool‑call reliability, reward hacking pitfalls, evaluation alignment, and scaling tricks for larger models.

PPO EWMARL ScalingReward Design

0 likes · 7 min read

Key Lessons from Scaling Agent RL Training: Stability, Tooling, and Reward Design

Baobao Algorithm Notes

Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMModel Optimization

0 likes · 11 min read

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

Baobao Algorithm Notes

Nov 20, 2025 · Artificial Intelligence

Why Reinforcement Learning Preserves LLM Generality Better Than Supervised Fine‑Tuning

The article analyzes why reinforcement learning (RL) fine‑tuning retains a large language model's general abilities better than supervised fine‑tuning (SFT), explaining the off‑policy distribution shift of SFT and the on‑policy data consistency, KL penalty, and trust‑region mechanisms that give RL its anti‑forgetting properties.

Catastrophic ForgettingLLMOn-Policy Data

0 likes · 8 min read

Why Reinforcement Learning Preserves LLM Generality Better Than Supervised Fine‑Tuning

Baobao Algorithm Notes

Nov 18, 2025 · Industry Insights

Google Gemini 3 Pro Beats GPT‑5.1 on Top AGI Benchmarks – What the Results Reveal

Google's Gemini 3 and Gemini 3 Pro launch topped major AGI benchmarks such as The Human Last Exam and ARC‑AGI‑2, outperformed GPT‑5.1 in a complex 3‑D gear visualization task, and even generated a functional cloud‑OS prototype, signaling a notable shift toward true artificial general intelligence.

AGI benchmarksAI competitionGoogle Gemini

0 likes · 5 min read

Google Gemini 3 Pro Beats GPT‑5.1 on Top AGI Benchmarks – What the Results Reveal

Baobao Algorithm Notes

Nov 18, 2025 · Artificial Intelligence

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

The LightReasoner paper from Hong Kong University shows that small language models can guide large models on critical reasoning steps, achieving up to 90% faster inference and significant accuracy gains across multiple math benchmarks.

Contrastive DecodingKL DivergenceMathematical Reasoning

0 likes · 9 min read

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

Baobao Algorithm Notes

Nov 13, 2025 · Artificial Intelligence

Introducing UNO‑Bench: The First Unified Omni‑Modal LLM Evaluation Suite

UNO‑Bench, an open‑source benchmark from Meituan’s LongCat team, provides the first high‑quality, low‑redundancy unified evaluation framework for omni‑modal large language models, featuring 1,250 manually annotated cross‑modal samples and 2,480 enhanced single‑modal samples covering 44 fine‑grained tasks and five modality combinations.

AI Scaling Lawbenchmarkdata pipeline

0 likes · 15 min read

Introducing UNO‑Bench: The First Unified Omni‑Modal LLM Evaluation Suite