Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations
The article details Alibaba’s low‑carbon CTR model slimming, showing how binary‑code hash embeddings compress massive feature volumes while the Adaptive‑Masked Twins‑based Layer dynamically reduces embedding dimensions, together cutting storage and compute, lowering collisions, and preserving accuracy for large‑scale search advertising.
As a companion to the article "Alibaba Mama Search Advertising CTR Model’s ‘Slimming’ Journey", this text expands on two CIKM 2021 papers and details the continuous thinking and practice behind model slimming. It describes how the team shifted from large, resource‑hungry models to resource‑friendly small models, breaking the compute bottleneck and enabling fast, efficient, agile iteration across many scenarios.
The previous article highlighted that, in the era of diminishing compute returns, systematic algorithmic optimization can replace “big‑model” approaches. Large‑scale models contain redundant information; careful design can achieve low‑carbon slimming.
Earlier work classified CTR model optimization by component structure, noting that in large‑scale sparse feature settings the Embedding Layer dominates parameter scale. The Embedding Table can be viewed as a 2‑D matrix whose rows (Feature Volume) and columns (Embedding Dimension) jointly determine size. Optimizing both dimensions is therefore crucial.
Feature Volume (Binary Code based Hash Embedding) focuses on the row side of the Embedding Table. ID‑type features (e.g., userid, queryid, adid) have massive cardinalities and require frequent updates. Two expression strategies exist:
Non‑conflict expression: each ID retains a unique embedding (One‑Hot style), which demands high engineering effort and large storage.
Conflict expression: IDs share embeddings, reducing model size. Conflict can be designed manually (based on frequency thresholds) or randomly via hashing tricks.
The team adopted the Hashing Trick, extending it with MultiHash, HybridHash, QR‑trick, etc., to lower collision rates while preserving information.
To further improve, they proposed a novel Binary Code based Hash Embedding (BH) that replaces the modulo operation with a binary‑code grouping and merging strategy. BH maintains ID uniqueness, dramatically reduces collisions, and can be tuned for various compression ratios, even supporting on‑device models.
Embedding Dimension: Adaptively‑Masked Twins‑based Layer (ATML) addresses the need for variable dimensions across IDs. Traditional fixed dimensions waste space on low‑frequency IDs. Rule‑based and NAS‑based methods exist, but they have limitations. ATML adds a mask conversion layer that incorporates feature frequency priors, uses a relaxed differentiable formulation, and employs a twin‑gate network to handle the long‑tail distribution. This enables end‑to‑end, differentiable dimension selection, reduces model size, and improves generalization on low‑frequency IDs.
In summary, low‑carbon model slimming is universally valuable. Embedding optimization is challenging due to slow convergence and high experimental cost. The team’s work on accelerating embedding convergence and efficient compression (BH and ATML) demonstrates practical pathways for large‑scale CTR models to achieve high performance with reduced resources.
References to the detailed papers are provided (arXiv links: 2109.02471.pdf and 2108.11513.pdf) along with a bibliography of related works.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
