Tagged articles
3 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 29, 2024 · Artificial Intelligence

Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud

This article explains how Alibaba Cloud's PAI platform and NVIDIA's Megatron-Core enable efficient training of sparse Mixture-of-Experts (MoE) large language models, covering algorithm basics, the Megatron-Core MoE framework, weight conversion pipelines, and performance results on Mixtral‑8x7B.

Large Language ModelsMegatron-CoreMixture of Experts
0 likes · 18 min read
Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 25, 2022 · Artificial Intelligence

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

The article introduces Alibaba Cloud’s PST algorithm, a parameter‑efficient sparsity method that combines data‑free and data‑driven importance metrics to achieve low‑rank and structured sparsity, enabling large language models to be fine‑tuned with only 1.5% of parameters while maintaining comparable accuracy.

AIPST algorithmmodel compression
0 likes · 8 min read
Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity
Alimama Tech
Alimama Tech
May 11, 2022 · Artificial Intelligence

PICASSO: An Industrial-Scale Sparse Training Engine for Wide-and-Deep Recommender Systems

PICASSO, Alibaba’s GPU‑centric sparse training engine for wide‑and‑deep recommender systems, merges identical embedding tables, interleaves data and kernel operations, and caches hot embeddings on GPU, eliminating the parameter server and delivering up to tenfold speedups over TensorFlow‑PS while maintaining model quality.

AlibabaGPU Optimizationmachine learning
0 likes · 14 min read
PICASSO: An Industrial-Scale Sparse Training Engine for Wide-and-Deep Recommender Systems