Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

295
Articles
0
Likes
378
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 24, 2026 · Artificial Intelligence

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

After DPO, the typical research trajectory moves through GRPO, DAPO, GSPO, and SAPO, each introducing new optimization objectives, sampling strategies, and reward‑shaping techniques that aim to reduce memory usage, improve gradient stability, and enhance the efficiency of large‑model reinforcement learning.

DAPOGRPOGSPO
0 likes · 6 min read
What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 19, 2026 · Artificial Intelligence

Unlock Coze Agent Skills: Build No‑Code AI Agents in Minutes

This guide explains what Coze Agent Skills are, how they are structured with YAML and Markdown, and provides step‑by‑step instructions and real‑world examples for creating, deploying, and using no‑code AI skills to automate tasks like model scouting, recipe suggestions, and camera settings.

AI automationAgent skillsArtificial Intelligence
0 likes · 8 min read
Unlock Coze Agent Skills: Build No‑Code AI Agents in Minutes
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2026 · Artificial Intelligence

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

GLM-Image, a Chinese text‑to‑image model trained end‑to‑end on Huawei Ascend 800T A2 NPUs, combines an autoregressive decoder with a diffusion encoder, supports resolutions up to 2048×2048, and offers open‑source code, API access, and detailed prompts that demonstrate its strong layout and typography capabilities.

GLM-ImageHuawei Ascenddiffusion
0 likes · 12 min read
How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 25, 2025 · Artificial Intelligence

TeleChat3-105B: China’s First 100B‑Scale MoE Model and Its Technical Breakthroughs

The article analyzes TeleChat3-105B-A4.7-Thinking, the first domestically built 100‑billion‑parameter Mixture‑of‑Experts model, detailing its multi‑dimensional evaluation, three‑stage training pipeline, hardware‑level optimizations, fine‑grained architecture, and its significance for the evolving AI competition landscape.

AI trainingChinese AIMixture of Experts
0 likes · 6 min read
TeleChat3-105B: China’s First 100B‑Scale MoE Model and Its Technical Breakthroughs
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 24, 2025 · Artificial Intelligence

GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning

The GLM-4.7 model launches with record‑breaking benchmark scores across coding, reasoning, and real‑world programming tasks, outperforming both open‑source and commercial LLMs while introducing advanced interleaved, retained, and round‑level thinking modes that enhance complex task execution.

AI model comparisonCoding AIGLM-4.7
0 likes · 9 min read
GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Asynchronous InferenceRL SystemsTraining efficiency
0 likes · 18 min read
Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 20, 2025 · Artificial Intelligence

How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs

The article analyzes the 2025 shift toward a unified "general‑type" agent architecture exemplified by Claude Code and Deep Agent, detailing industry adoption, core technical features, skill‑based extensions, long‑running capabilities, and practical steps for building domain‑specific agents.

AI ArchitectureAgent skillsClaude Code
0 likes · 25 min read
How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 7, 2025 · Artificial Intelligence

Key Lessons from Scaling Agent RL Training: Stability, Tooling, and Reward Design

Over recent months of extensive agent reinforcement‑learning experiments across search, data‑analysis, and multi‑source scenarios, the author shares twelve practical insights covering stability, environment‑reward‑algorithm priorities, tool‑call reliability, reward hacking pitfalls, evaluation alignment, and scaling tricks for larger models.

PPO EWMARL scalingReinforcement Learning
0 likes · 7 min read
Key Lessons from Scaling Agent RL Training: Stability, Tooling, and Reward Design
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMReinforcement Learning
0 likes · 11 min read
Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings