Artificial Intelligence 8 min read

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

The article introduces Alibaba Cloud’s PST algorithm, a parameter‑efficient sparsity method that combines data‑free and data‑driven importance metrics to achieve low‑rank and structured sparsity, enabling large language models to be fine‑tuned with only 1.5% of parameters while maintaining comparable accuracy.

Alibaba Cloud Big Data AI Platform

Jul 25, 2022

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

Background

Recent years have seen the emergence of massive language models ranging from billions to trillions of parameters, demanding huge hardware resources for training and deployment. Reducing these resource requirements is a pressing challenge.

Model compression, especially sparsity, removes unnecessary weights, converting dense computation to sparse, thereby lowering memory usage and speeding up inference while preserving accuracy, making it well‑suited for extremely large models.

Challenge

Existing sparse training methods fall into two categories. Weight‑based (data‑free) approaches such as magnitude pruning evaluate importance via L1 norm, offering efficiency but limited accuracy. Data‑driven methods like movement pruning use weight‑gradient products for more accurate importance estimation, but require storing a full‑size importance matrix, increasing memory and computation, especially for large models.

Data‑driven methods also introduce extra parameters equal in size to the weights, prompting the need for a more efficient importance metric.

Breakthrough

The proposed PST algorithm combines data‑free and data‑driven importance indicators into a unified formula, reducing the extra parameters required for the data‑driven component. By analyzing the low‑rank and structured nature of the importance matrix, PST represents it with four small matrices: two low‑rank matrices (A, B) and two structured matrices (R, C). Weight updates are similarly factorized into two small matrices (U, V).

Empirical analysis shows that after sparsification, many rows and columns become highly sparse, motivating the structured component of PST.

Results

Experiments on NLU tasks (BERT, RoBERTa) and NLG tasks (GPT‑2) compare PST with magnitude pruning and movement pruning. At 90% sparsity, PST attains comparable accuracy on most datasets while updating only 1.5% of the parameters.

PST has been integrated into Alibaba Cloud Machine Learning PAI’s model compression library and the Alicemind platform. On a 100‑billion‑parameter PLUG model, PST achieves a 2.5× speedup and a ten‑fold memory reduction without accuracy loss, accelerating large‑model deployment across industries.

Paper Details

Title: Parameter‑Efficient Sparsity for Large Language Models Fine‑Tuning

Authors: Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai

PDF: https://arxiv.org/pdf/2205.11005.pdf

AI model compression sparse training parameter efficiency PST algorithm

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.