Tag

embedding compression

0 views collected around this technical thread.

Kuaishou Tech
Kuaishou Tech
Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

efficiencyembedding compressionlarge-scale AI
0 likes · 8 min read
SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models
Alimama Tech
Alimama Tech
Jan 19, 2022 · Artificial Intelligence

Advances in Alibaba Search Advertising Estimation: Model Deepening, Interaction, and System Efficiency (2021 Review)

The 2021 review of Alibaba’s Mama Search Advertising estimation platform details advances in model deepening—such as hash‑based embedding compression, adaptive dynamic parameters and graph neural networks—model interaction via a multi‑stage cascade with ranking distillation and oracle bias, and system efficiency gains from HPC training, mixed‑precision, multi‑hash embeddings, and fp16 quantization that deliver roughly a thirty‑fold speed‑up.

CVRGraph Neural Networksad tech
0 likes · 34 min read
Advances in Alibaba Search Advertising Estimation: Model Deepening, Interaction, and System Efficiency (2021 Review)