Artificial Intelligence 15 min read

Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models

AMTL inserts an adaptively‑learned twin‑network mask after each representation layer to prune unnecessary embedding dimensions per feature value, automatically assigning larger sizes to high‑frequency features, achieving higher CTR accuracy, about 60% storage reduction, and seamless hot‑starting across recommendation models.

Alimama Tech

Nov 17, 2021

Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models

In deep recommendation models, learning representations for ID-type features is crucial. Traditionally, each feature value is mapped to an embedding vector of a fixed dimension, which is suboptimal for both learning effectiveness and storage cost. Existing solutions based on handcrafted rules or neural architecture search either require extra human knowledge or are difficult to train and do not support hot‑starting of embeddings.

This work proposes a novel and efficient method for selecting appropriate embedding dimensions for each feature value. After each representation layer, an Adaptively‑Masked Twins‑based Layer (AMTL) is inserted to generate a mask that removes unnecessary dimensions from the embedding vector. The mask is learned adaptively, allowing the method to be applied to various models and to support hot‑starting of embeddings.

AMTL consists of a twin network architecture with two non‑shared adaptive mask layers (h‑AML for high‑frequency features and l‑AML for low‑frequency features). Feature frequency information is fed into both branches; their outputs are combined by a weighted sum, and a softmax followed by a straight‑through estimator (STE) yields a differentiable approximation of the discrete mask. This design avoids the bias toward high‑frequency features that a single‑branch network would suffer.

Extensive experiments on three datasets (MovieLens, IJCAI‑AAC, Taobao) compare AMTL with standard fixed‑dimension embeddings (FBE), rule‑based mixed‑dimension embeddings (MDE), and NAS‑based AutoEmb. Results show that AMTL consistently achieves higher CTR prediction accuracy, reduces embedding storage by about 60%, and significantly improves hot‑start performance because the masked embeddings can be initialized from a pre‑trained model. Additional analyses demonstrate that AMTL automatically assigns larger dimensions to high‑frequency features and smaller ones to low‑frequency features, validates the benefit of the twin structure and STE through ablation studies, and reports a modest increase in inference time that can be eliminated by storing the masked embeddings directly.

In summary, AMTL provides an effective solution for embedding dimension optimization in recommendation systems, offering better accuracy, lower storage, and seamless hot‑starting, and represents a further step toward model slimming in the AI domain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Embedding Recommendation Systems adaptive masking dimension selection hot start

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.