How Transformers Enable Personalized Outfit Generation for Fashion Recommendation
This article presents a Transformer‑based framework that simultaneously generates visually compatible outfits and personalizes recommendations by leveraging multimodal item embeddings and user behavior, achieving significant gains in compatibility prediction, fill‑in‑the‑blank accuracy, and click‑through rate on Alibaba's iFashion platform.
Introduction
Outfit generation and recommendation are becoming crucial in fashion e‑commerce. A high‑quality outfit must be visually harmonious and logically compatible, while recommendations need to reflect individual user preferences. Existing works treat these two tasks separately, leading to scalability challenges for large platforms such as Taobao.
Dataset
We collected 1.01 million outfits (58.3 k unique items) from Alibaba’s iFashion channel, filtering low‑frequency categories and outfits with fewer than four items. User interaction logs over three months yielded 3.57 million active users and 19.2 million training instances (12.7 k outfits, 4.46 M items), each with a white‑background image, title, and leaf‑category label. This is the largest publicly released fashion‑outfit dataset with user behavior.
Model
Multimodal Embedding
Each item f is represented by a fused embedding from three modalities: a CNN‑derived visual vector, a TextCNN‑derived title vector, and a graph embedding from Alibaba’s Behemoth platform. A fully‑connected layer followed by a triplet loss encourages items of the same leaf category to be close while pushing different categories apart.
FOM (Fashion Outfit Model)
To capture item‑item relationships, we adopt a bidirectional Transformer encoder without positional embeddings (treating outfits as sets). A mask‑task hides one item ([MASK]) and forces the model to predict it from the remaining items, learning compatibility via self‑attention. The loss maximizes the probability of the correct masked item among three negative samples.
POG (Personalized Outfit Generation)
Building on FOM, POG adds a user encoder‑decoder architecture. The encoder (Per network) consumes a sequence of items the user has clicked, while the decoder (Gen network) generates a compatible outfit item by item, conditioned on both the learned compatibility from FOM and the user’s personalized embedding. Training minimizes the probability of the correct next item against three random negatives.
Deployment – "滴搭" Platform
POG is deployed in the online "滴搭" system, supporting item selection, outfit generation, collage creation, and personalized recommendation for millions of items. Over 1 million Alibaba operators have used the platform, generating ~6 million outfits daily, viewed by ~5.4 million users.
Experiments
Compatibility Evaluation
We evaluate using FITB (Fill‑In‑The‑Blank) and CP (Compatibility Prediction). The FOM model outperforms baselines (F‑LSTM, Bi‑LSTM, SetNN) in both ordered and unordered settings, achieving up to 5.98 % absolute improvement in FITB and 34.90 % in CP over the best baseline.
Personalized Recommendation
Online A/B tests compare three strategies: random recommendation (RR), collaborative‑filtering‑based triggers (CF), and POG. Across a 7‑day period, POG consistently yields the highest click‑through rate, surpassing CF by a large margin.
Conclusion
By jointly modeling outfit compatibility and user preference with a Transformer‑based encoder‑decoder, POG achieves state‑of‑the‑art performance on both offline metrics and online CTR, and scales to industrial‑size e‑commerce platforms.
References
Jacob Devlin et al., "BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding", 2018.
Xintong Han et al., "Learning fashion compatibility with bidirectional LSTMs", ACM MM 2017.
Yuncheng Li et al., "Mining fashion outfit composition using an end‑to‑end deep learning approach on set data", IEEE TIP 2017.
Ashish Vaswani et al., "Attention is All You Need", NIPS 2017.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
