Wenxin 4.5 Series: Open‑Source Multimodal MoE Models and FastDeploy Guide
The Wenxin 4.5 series introduces ten open‑source models—including large‑scale MoE and dense variants—featuring a novel multimodal heterogeneous architecture, high training efficiency, SOTA benchmark performance, and comprehensive toolkits (ERNIEKit, FastDeploy) for fine‑tuning and multi‑hardware deployment.
Wenxin 4.5 series open-source models have been officially released, comprising ten models including 47B and 3B parameter MoE models (total up to 424B parameters) and a 0.3B dense model.
The models and code are fully open‑sourced on Hugging Face ( https://huggingface.co/baidu ), GitHub ( https://github.com/PaddlePaddle/ERNIE ) and the Paddle AI Studio community.
A novel multimodal heterogeneous MoE architecture enables cross‑modal parameter sharing while preserving dedicated parameter spaces for each modality, improving multimodal understanding and maintaining or boosting text‑task performance.
All models are trained with the Paddle deep‑learning framework, achieving a 47% model FLOPs utilization (MFU) in large‑language‑model pre‑training, and reaching SOTA results on various text and multimodal benchmarks.
Technical highlights include a multimodal MoE pre‑training strategy, heterogeneous mixture‑parallel and multi‑level load‑balancing for efficient training, FP8 mixed‑precision, fine‑grained recomputation, and 4‑bit/2‑bit quantization methods for inference.
Post‑training supports modality‑specific fine‑tuning with SFT, DPO, LoRA, and quantization techniques.
Developers can use the ERNIEKit toolkit for end‑to‑end model fine‑tuning and inference, with example commands provided:
# Download model
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# One‑line training command
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yamlFastDeploy offers one‑line deployment across multiple hardware, compatible with vLLM and OpenAI APIs, and supports low‑bit quantization, context caching, and speculative decoding:
from fastdeploy import LLM, SamplingParams
prompt = "把李白的静夜思改写为现代诗"
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="baidu/ERNIE-4.5-0.3B-Paddle", max_model_len=32768)
outputs = llm.generate(prompt, sampling_params)Additional resources, including detailed technical reports, usage guides, and community projects, are available via the Wenxin large‑model technology blog and the Paddle AI Studio community.
As of April 2025, the Wenxin ecosystem has served over 21.85 million developers and 670 000 enterprises, and plans a series of open‑source courses and events to further promote AI research and application.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
