Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

291

Articles

Likes

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

The article details Qwen3's three‑phase pretraining pipeline, long‑context extensions, a cold‑start long‑chain‑of‑thought dataset, reinforcement‑learning fine‑tuning with custom rewards, and a two‑stage distillation process that yields versatile, thought‑controlled language models.

DistillationQwen3long-context

0 likes · 15 min read

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

Why Decoder‑Only Models Dominate AI Today: Beyond the Low‑Rank Myth

The article explains why the once‑popular low‑rank argument is outdated and how decoder‑only architectures have become mainstream thanks to KV‑cache efficiency, open‑source projects like vLLM and sglang, and their impact on modern AI interview expectations.

KV cachedecoder-onlyopen-source

0 likes · 3 min read

Why Decoder‑Only Models Dominate AI Today: Beyond the Low‑Rank Myth

Baobao Algorithm Notes

May 12, 2025 · Artificial Intelligence

Why Dropout Is Dropped in Large‑Scale Model Training: Effects, Efficiency, Stability

Training massive AI models now commonly omits dropout because its original scaling trick fails to match training and inference distributions, leading to poorer performance, higher computational cost, and instability, while alternative regularization like normalization remains useful, as illustrated by practical observations and historical tricks.

AI stabilitydropoutlarge models

0 likes · 6 min read

Why Dropout Is Dropped in Large‑Scale Model Training: Effects, Efficiency, Stability

Baobao Algorithm Notes

May 2, 2025 · Artificial Intelligence

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

This article analyzes whether reinforcement learning enhances large language model reasoning, compares findings from DeepSeek-Math, a Tsinghua‑Shanghai Jiao‑Tong paper, and Qwen3, and outlines practical training pipelines—including Seed‑Thinking‑v1.5, DeepSeek‑R1, Kimi‑K1.5, and Qwen3—that aim to endow LLMs with robust reasoning capabilities.

Artificial IntelligenceLLMReasoning

0 likes · 12 min read

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

Baobao Algorithm Notes

Apr 28, 2025 · Artificial Intelligence

What Makes Qwen3 the Next Leap in Large Language Models?

The article announces Qwen3, detailing its flagship 235B and smaller MoE models, superior benchmark performance, extensive multilingual support, expanded pretraining data, four-stage post‑training, flexible thinking modes, deployment guides for SGLang, vLLM, Ollama, and future plans toward AGI‑level capabilities.

AI researchQwen3deployment

0 likes · 15 min read

What Makes Qwen3 the Next Leap in Large Language Models?

Baobao Algorithm Notes

Apr 27, 2025 · Artificial Intelligence

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

The DeepSeek‑R1T‑Chimera model merges DeepSeek‑R1 reasoning with V3‑0324 architecture, reusing most V3 weights and swapping only the blue‑highlighted R1 routing experts, achieving the same intelligence as R1 while reducing output tokens by about 40% and running faster, all without any fine‑tuning or distillation.

Artificial IntelligenceDeepSeekLLM

0 likes · 5 min read

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

Baobao Algorithm Notes

Apr 27, 2025 · Artificial Intelligence

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning

A small tech firm, tngtech, released an open‑source model fusion called DeepSeek‑R1T‑Chimera that merges R1 inference with V3‑0324 without fine‑tuning, distillation, or prompts, achieving the same intelligence as R1 while reducing token output by 40% and speeding up inference.

Artificial IntelligenceDeepSeekLLM

0 likes · 4 min read

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning

Baobao Algorithm Notes

Apr 20, 2025 · Artificial Intelligence

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

This article explores the emerging concept of agentic reinforcement learning for large language models, analyzes ByteDance's VeRL and the Search‑R1 frameworks, identifies practical challenges in tool integration and environment parallelism, and proposes a unified, Ray‑based architecture to enable scalable, high‑quality RL environments.

Rayenvironment designsearch-r1

0 likes · 11 min read

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

Baobao Algorithm Notes

Apr 16, 2025 · Artificial Intelligence

Why Reinforcement Learning Finally Works: The Second Half of AI

The article argues that AI has entered its second half, where reinforcement learning finally generalizes thanks to large‑scale language pretraining and reasoning, shifting focus from building ever better models to redefining problems, evaluation methods, and real‑world utility.

AI researchIndustry trends

0 likes · 16 min read

Why Reinforcement Learning Finally Works: The Second Half of AI

Baobao Algorithm Notes

Apr 15, 2025 · Industry Insights

Why GLM‑Z1‑AirX Hits 150‑200 TPS: A Deep Dive into LLM Speed Benchmarking

The article examines the slowdown caused by long‑chain‑of‑thought LLMs, presents a Python benchmarking script, compares token‑per‑second performance of several models—including the ultra‑fast GLM‑Z1‑AirX—and demonstrates a real‑time anti‑fraud use case that benefits from sub‑second response times.

GLM-Z1-AirXLLMPerformance

0 likes · 13 min read

Why GLM‑Z1‑AirX Hits 150‑200 TPS: A Deep Dive into LLM Speed Benchmarking