Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

291

Articles

Likes

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

Mar 15, 2025 · Industry Insights

Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive

The article reveals how the GAIA agent benchmark’s publicly available validation set enables participants to cheat by submitting scores derived from known answers, exposing unprofessional practices by teams like Manus and OpenAI and urging the community to rely only on hidden test data for fair evaluation.

GAIA benchmarkleaderboard integrityvalidation set

0 likes · 4 min read

Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive

Baobao Algorithm Notes

Mar 13, 2025 · Artificial Intelligence

Why EP Outperforms TP for Deepseek V3/R1 Inference: Cost, Performance, and Reliability

This article analyzes Deepseek's EP‑based inference architecture for V3/R1 models, comparing it with TP, detailing how EP reduces memory and compute overhead, boosts batch size, cuts GPU memory usage, and introduces reliability, scalability, and maintainability challenges for large‑scale deployments.

AI infrastructureExpert ParallelismGPU memory optimization

0 likes · 18 min read

Why EP Outperforms TP for Deepseek V3/R1 Inference: Cost, Performance, and Reliability

Baobao Algorithm Notes

Mar 10, 2025 · Artificial Intelligence

Why DeepSeek V3’s FP8 Training Beats Traditional Schemes: A Deep Dive

This article provides a detailed technical analysis of FP8 training, comparing Nvidia’s TransformerEngine approach with DeepSeek V3’s novel scheme, and examines how block‑wise scaling, high‑precision accumulation, and vector length and correlation affect quantization error and signal‑to‑noise ratio in large‑language‑model training.

DeepSeekFP8LLM

0 likes · 20 min read

Why DeepSeek V3’s FP8 Training Beats Traditional Schemes: A Deep Dive

Baobao Algorithm Notes

Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaLarge Language ModelTransformer

0 likes · 4 min read

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Baobao Algorithm Notes

Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLarge Language ModelLogic Reasoning

0 likes · 11 min read

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

Baobao Algorithm Notes

Feb 25, 2025 · Artificial Intelligence

FlashMLA vs FlashInfer: DeepSeek Inference Performance Benchmarks Revealed

The author benchmarks DeepSeek's FlashMLA against FlashInfer and several Triton-based implementations, detailing setup challenges, decode‑only bandwidth results, and observations that the official DeepSeek version leads while Triton optimizations show mixed performance across different head sizes.

AIDeepSeekFlashMLA

0 likes · 6 min read

FlashMLA vs FlashInfer: DeepSeek Inference Performance Benchmarks Revealed

Baobao Algorithm Notes

Feb 24, 2025 · Artificial Intelligence

How to Build a Breakfast Shop AI Agent with Baidu Wenxin and DeepSeek R1

This article provides a step‑by‑step guide to creating a breakfast‑shop reception AI agent on Baidu's Wenxin Intelligent Agent platform, highlighting its core features, model selection with DeepSeek R1, and practical tips for configuring personas, knowledge bases, and plugins.

AI AgentBaidu WenxinDeepSeek

0 likes · 7 min read

How to Build a Breakfast Shop AI Agent with Baidu Wenxin and DeepSeek R1

Baobao Algorithm Notes

Feb 20, 2025 · Industry Insights

How DeepSeek R1 Is Redefining Large‑Model Engineer Roles and the AI Job Market

The article analyzes DeepSeek R1’s release, showing how rising base‑model thresholds, a shift toward infrastructure‑centric skills, and the rise of retrieval‑augmented generation are rapidly diminishing traditional large‑model algorithm engineer positions while reshaping the broader AI industry landscape.

AGIAI industryDeepSeek

0 likes · 6 min read

How DeepSeek R1 Is Redefining Large‑Model Engineer Roles and the AI Job Market

Baobao Algorithm Notes

Feb 19, 2025 · Artificial Intelligence

How X‑R1’s New Open‑Source 0.5B/1.5B/3B Models Enable LoRA and Chinese Inference

The X‑R1 release introduces fully open‑source 0.5B, 1.5B and 3B models with one‑click training scripts, LoRA fine‑tuning support, Chinese inference capabilities, detailed reward‑curve visualizations, and quick‑start instructions for both CUDA and Ascend platforms.

AI researchChinese inferenceLoRA

0 likes · 5 min read

How X‑R1’s New Open‑Source 0.5B/1.5B/3B Models Enable LoRA and Chinese Inference

Baobao Algorithm Notes

Feb 17, 2025 · Artificial Intelligence

Can TransMLA Turn GQA into a More Powerful MLA? A Deep Dive into DeepSeek Models

This article presents a theoretical and experimental analysis of converting Group Query Attention (GQA) models to Multi‑Head Linear Attention (MLA) using the TransMLA method, demonstrating superior expressiveness and performance on DeepSeek‑based large language models while keeping KV‑Cache costs unchanged.

AttentionDeepSeekGQA

0 likes · 11 min read

Can TransMLA Turn GQA into a More Powerful MLA? A Deep Dive into DeepSeek Models