Tagged articles
2 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 26, 2024 · Artificial Intelligence

MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training

This article reviews the evolution of Mixture-of-Experts (MoE) models, details Alibaba Cloud’s collaboration with NVIDIA’s Megatron-Core to build a high-performance MoE framework, and presents extensive training optimizations, benchmark results, conversion tools, and best-practice guidelines for large-scale LLM development and deployment.

Alibaba CloudMegatron-CoreMoE
0 likes · 18 min read
MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 29, 2024 · Artificial Intelligence

Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud

This article explains how Alibaba Cloud's PAI platform and NVIDIA's Megatron-Core enable efficient training of sparse Mixture-of-Experts (MoE) large language models, covering algorithm basics, the Megatron-Core MoE framework, weight conversion pipelines, and performance results on Mixtral‑8x7B.

Megatron-CoreMixture of ExpertsModel Parallelism
0 likes · 18 min read
Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud