Industry Insights 8 min read

Why Baidu’s Open‑Source Ernie 4.5 Could Redefine the Global AI Race

Baidu has open‑sourced ten Ernie 4.5 models ranging from 0.3B to 424B parameters, featuring multimodal MoE pre‑training, advanced infrastructure, and post‑training tricks that deliver benchmark results surpassing DeepSeek‑V3 and OpenAI‑o1, sparking worldwide industry attention and reshaping AI competition.

Baobao Algorithm Notes

Jul 2, 2025

Why Baidu’s Open‑Source Ernie 4.5 Could Redefine the Global AI Race

Overview

Baidu released the Ernie (Wenxin) 4.5 series as open‑source in June, providing ten models that span dense and Mixture‑of‑Experts (MoE) architectures with parameter counts ranging from 0.3 B to 424 B.

Technical Highlights

1. Multimodal Heterogeneous MoE Pre‑training

The 4.5 series uses a multimodal MoE backbone that jointly learns text and visual tokens. Key components include:

Heterogeneous expert mixture that separates modality‑specific experts.

Multi‑dimensional rotary positional encodings.

Orthogonal regularisation to balance token streams from different modalities.

These mechanisms improve text generation, image understanding, and multimodal reasoning.

2. Scalable and Efficient Training Infrastructure

To accelerate pre‑training, Baidu introduced:

Heterogeneous mixture parallelism combined with multi‑level load balancing.

Expert‑parallel execution within each compute node.

Memory‑friendly pipeline scheduling and fine‑grained recomputation.

FP8 mixed‑precision training for higher throughput.

For inference, the models support near‑lossless 4‑bit and 2‑bit quantisation, expert‑parallel collaborative quantisation, and a decoupled pre‑fill/decoding deployment that maximises hardware utilisation across CPUs, GPUs and accelerators via the PaddlePaddle framework.

3. Modality‑Specific Post‑Training

After the base pre‑training, Baidu applies specialised fine‑tuning:

Large‑language models receive general supervised fine‑tuning (SFT) and Direct Preference Optimisation (DPO).

Multimodal models undergo visual‑language alignment, task‑oriented fine‑tuning, and Reinforcement Learning with Verifiable Rewards (RLVR) to boost alignment and performance on vision‑language tasks.

Performance Evaluation

Authoritative benchmark suites show the Ernie 4.5 models outperforming competing systems such as DeepSeek‑V3 and OpenAI‑o1 on a wide range of metrics. Hugging Face engineers reported that the 4.5 series is competitive with Qwen‑3/DeepSeek‑V3, and AI engineer Rohan Paul observed that 22 out of 28 evaluated benchmarks surpassed DeepSeek‑V3‑671B‑A37B‑Base.

Repository and Release Information

The full model weights, training scripts, and inference code are hosted on Baidu’s public Git repository (e.g., https://github.com/baidu/ernie-4.5) and are released under a permissive license that allows commercial use and further modification.

AI competition Baidu Ernie

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.