Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models
AMTL inserts an adaptively‑learned twin‑network mask after each representation layer to prune unnecessary embedding dimensions per feature value, automatically assigning larger sizes to high‑frequency features, achieving higher CTR accuracy, about 60% storage reduction, and seamless hot‑starting across recommendation models.
In deep recommendation models, learning representations for ID-type features is crucial. Traditionally, each feature value is mapped to an embedding vector of a fixed dimension, which is suboptimal for both learning effectiveness and storage cost. Existing solutions based on handcrafted rules or neural architecture search either require extra human knowledge or are difficult to train and do not support hot‑starting of embeddings.
This work proposes a novel and efficient method for selecting appropriate embedding dimensions for each feature value. After each representation layer, an Adaptively‑Masked Twins‑based Layer (AMTL) is inserted to generate a mask that removes unnecessary dimensions from the embedding vector. The mask is learned adaptively, allowing the method to be applied to various models and to support hot‑starting of embeddings.
AMTL consists of a twin network architecture with two non‑shared adaptive mask layers (h‑AML for high‑frequency features and l‑AML for low‑frequency features). Feature frequency information is fed into both branches; their outputs are combined by a weighted sum, and a softmax followed by a straight‑through estimator (STE) yields a differentiable approximation of the discrete mask. This design avoids the bias toward high‑frequency features that a single‑branch network would suffer.
Extensive experiments on three datasets (MovieLens, IJCAI‑AAC, Taobao) compare AMTL with standard fixed‑dimension embeddings (FBE), rule‑based mixed‑dimension embeddings (MDE), and NAS‑based AutoEmb. Results show that AMTL consistently achieves higher CTR prediction accuracy, reduces embedding storage by about 60%, and significantly improves hot‑start performance because the masked embeddings can be initialized from a pre‑trained model. Additional analyses demonstrate that AMTL automatically assigns larger dimensions to high‑frequency features and smaller ones to low‑frequency features, validates the benefit of the twin structure and STE through ablation studies, and reports a modest increase in inference time that can be eliminated by storing the masked embeddings directly.
In summary, AMTL provides an effective solution for embedding dimension optimization in recommendation systems, offering better accuracy, lower storage, and seamless hot‑starting, and represents a further step toward model slimming in the AI domain.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.