Wenxin 4.5 Series: Open‑Source Multimodal MoE Models and FastDeploy Guide

The Wenxin 4.5 series introduces ten open‑source models—including large‑scale MoE and dense variants—featuring a novel multimodal heterogeneous architecture, high training efficiency, SOTA benchmark performance, and comprehensive toolkits (ERNIEKit, FastDeploy) for fine‑tuning and multi‑hardware deployment.

DataFunTalk
DataFunTalk
DataFunTalk
Wenxin 4.5 Series: Open‑Source Multimodal MoE Models and FastDeploy Guide

Wenxin 4.5 series open-source models have been officially released, comprising ten models including 47B and 3B parameter MoE models (total up to 424B parameters) and a 0.3B dense model.

The models and code are fully open‑sourced on Hugging Face ( https://huggingface.co/baidu ), GitHub ( https://github.com/PaddlePaddle/ERNIE ) and the Paddle AI Studio community.

A novel multimodal heterogeneous MoE architecture enables cross‑modal parameter sharing while preserving dedicated parameter spaces for each modality, improving multimodal understanding and maintaining or boosting text‑task performance.

All models are trained with the Paddle deep‑learning framework, achieving a 47% model FLOPs utilization (MFU) in large‑language‑model pre‑training, and reaching SOTA results on various text and multimodal benchmarks.

Technical highlights include a multimodal MoE pre‑training strategy, heterogeneous mixture‑parallel and multi‑level load‑balancing for efficient training, FP8 mixed‑precision, fine‑grained recomputation, and 4‑bit/2‑bit quantization methods for inference.

Post‑training supports modality‑specific fine‑tuning with SFT, DPO, LoRA, and quantization techniques.

Developers can use the ERNIEKit toolkit for end‑to‑end model fine‑tuning and inference, with example commands provided:

# Download model
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# One‑line training command
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yaml

FastDeploy offers one‑line deployment across multiple hardware, compatible with vLLM and OpenAI APIs, and supports low‑bit quantization, context caching, and speculative decoding:

from fastdeploy import LLM, SamplingParams
prompt = "把李白的静夜思改写为现代诗"
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="baidu/ERNIE-4.5-0.3B-Paddle", max_model_len=32768)
outputs = llm.generate(prompt, sampling_params)

Additional resources, including detailed technical reports, usage guides, and community projects, are available via the Wenxin large‑model technology blog and the Paddle AI Studio community.

As of April 2025, the Wenxin ecosystem has served over 21.85 million developers and 670 000 enterprises, and plans a series of open‑source courses and events to further promote AI research and application.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

multimodal AIlarge language modelsMoEopen-sourcePaddlePaddleERNIEKitFastDeploy
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.