Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

295
Articles
0
Likes
378
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 28, 2025 · Artificial Intelligence

Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It

The article explains how information entropy, cross‑entropy, and KL‑divergence shape reinforcement learning for large language models, describes the phenomenon of entropy collapse, compares token‑level and policy‑level entropy, and reviews recent methods like Clip‑Cov and KL‑Cov that mitigate this issue.

cross entropyentropypolicy entropy
0 likes · 11 min read
Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 28, 2025 · Artificial Intelligence

How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference

This article breaks down the GPU memory requirements of large language models during training and inference, detailing the contributions of model weights, optimizer states, activations, KV cache, and activation recomputation, and provides concrete formulas, examples, and scaling insights for models like Qwen3 and DeepSeek V3.

GPU memoryKV cacheLLM
0 likes · 18 min read
How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 23, 2025 · Artificial Intelligence

How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference

LongCat-Flash-Thinking, the latest open‑source model from Meituan, introduces domain‑parallel RL training, a high‑throughput DORA infra, and a dual‑path inference framework that together achieve state‑of‑the‑art performance on logical, mathematical, coding, and agentic tasks while maintaining top‑tier speed.

LongCatRL TrainingTool Use
0 likes · 10 min read
How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 22, 2025 · Artificial Intelligence

How to Add Special Tokens to LLMs Without Losing Performance

This guide explains why naïvely adding special tokens during supervised fine‑tuning can destabilize a large language model, and provides step‑by‑step strategies—including tokenizer updates, embedding resizing, smart initialization, and LoRA‑based PEFT—to integrate new tokens while preserving the model's original capabilities.

LLMLoRAspecial tokens
0 likes · 9 min read
How to Add Special Tokens to LLMs Without Losing Performance
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureHybrid AttentionMulti‑token prediction
0 likes · 8 min read
Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI safetyModel Alignmentconfidence threshold
0 likes · 8 min read
Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 3, 2025 · Artificial Intelligence

How Atom-Searcher Boosts LLM Reasoning with Atomic Thought Rewards

Atom-Searcher introduces an atomic‑thought reinforcement‑learning framework that decomposes complex reasoning into fine‑grained units, uses a Reasoning Reward Model to assign step‑wise rewards, dynamically balances process and result incentives, and achieves state‑of‑the‑art performance on multiple LLM benchmarks.

Agentic ResearchAtomic ThoughtLLM
0 likes · 12 min read
How Atom-Searcher Boosts LLM Reasoning with Atomic Thought Rewards
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 2, 2025 · Artificial Intelligence

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

LongCat‑Flash is a 560‑billion‑parameter Mixture‑of‑Experts LLM that combines a dynamic zero‑computation expert design, shortcut‑connected MoE communication, variance‑aligned scaling, and a three‑stage agent‑centric pre‑training pipeline, delivering over 100 TPS on H800 GPUs at a cost of $0.70 per million tokens.

Artificial IntelligenceLongCat-FlashMixture of Experts
0 likes · 23 min read
How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 17, 2025 · Artificial Intelligence

Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward

By replacing the traditional Pass@1 reward with a Pass@k formulation and a lightweight advantage computation, a 7B language model can dramatically improve its performance on math reasoning benchmarks, surpassing GPT‑4o while adding only a few lines of code and minimal training overhead.

PythonRLHFreward engineering
0 likes · 7 min read
Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward