Tagged articles

sparse training

3 articles · Page 1 of 1

Jan 29, 2024 · Artificial Intelligence

Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud

This article explains how Alibaba Cloud's PAI platform and NVIDIA's Megatron-Core enable efficient training of sparse Mixture-of-Experts (MoE) large language models, covering algorithm basics, the Megatron-Core MoE framework, weight conversion pipelines, and performance results on Mixtral‑8x7B.

Megatron-CoreMixture of Expertslarge language models

0 likes · 18 min read

Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud

Alibaba Cloud Big Data AI Platform

Jul 25, 2022 · Artificial Intelligence

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

The article introduces Alibaba Cloud’s PST algorithm, a parameter‑efficient sparsity method that combines data‑free and data‑driven importance metrics to achieve low‑rank and structured sparsity, enabling large language models to be fine‑tuned with only 1.5% of parameters while maintaining comparable accuracy.

AIModel CompressionPST algorithm

0 likes · 8 min read

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

Alimama Tech

May 11, 2022 · Artificial Intelligence

PICASSO: An Industrial-Scale Sparse Training Engine for Wide-and-Deep Recommender Systems

PICASSO, Alibaba’s GPU‑centric sparse training engine for wide‑and‑deep recommender systems, merges identical embedding tables, interleaves data and kernel operations, and caches hot embeddings on GPU, eliminating the parameter server and delivering up to tenfold speedups over TensorFlow‑PS while maintaining model quality.

AlibabaGPU Optimizationmachine learning

0 likes · 14 min read

PICASSO: An Industrial-Scale Sparse Training Engine for Wide-and-Deep Recommender Systems