How We Enhanced the MIND Multi‑Interest Model for Short‑Video Recommendations
This article analyzes the original MIND recommendation model, identifies its limitations in capsule design and routing, proposes three practical improvements—including max‑min and Markov‑based capsule initialization and a sparse routing scheme—and validates the gains with offline experiments and real‑world business metrics.
Business Background
Short‑video and feed scenarios generate massive user behavior sequences (exposures, plays, clicks, likes, follows), which are valuable for recommendation. Existing sequence models such as YouTube DNN, GRU4REC, and the original MIND model aim to capture user interests, with MIND uniquely modeling multiple interests via capsule networks.
Original MIND Model and Its Evolution
Proposed by Alibaba in 2019, MIND directly models several user interests using capsule routing inspired by CapsuleNet. In 2020, Alibaba and Tsinghua introduced ComiRec, improving the inference stage by merging results from multiple interest‑specific ANN searches to increase recall diversity.
Identified Issues in the Original MIND
Strong coupling between capsule count and user sequence length.
Random initialization of routing logits leads to instability.
High similarity among capsules causes redundant interest capture.
Inconsistent content within each capsule reduces interpretability.
Proposed Improvements
1. Capsule Initialization
We treat capsule initialization as a clustering problem. Using a max‑min strategy similar to k‑means++, we first select a random item as the initial capsule, then iteratively add the item farthest from the current capsule set until the desired number of capsules K (computed from sequence length) is reached.
We also propose a Markov‑based method: construct a similarity‑based transition matrix among items, compute the stationary distribution, and select high‑density points as candidate capsules. The final capsules are chosen by prioritizing candidates with many neighbors while respecting a maximum capsule count.
2. Routing Process Modification
Remove the shared bilinear mapping matrix S, as positional information is deemed irrelevant for multi‑interest modeling.
Replace the original squash function with a simple L2‑norm to normalize capsule vectors.
Transform the dense logit routing matrix into a sparse one by keeping only the maximum logit per item and zero‑ing out the rest; also replace cumulative updates with overwriting each routing iteration.
These changes emulate the hard‑assignment behavior of k‑means, ensuring each item contributes to a single capsule per routing round.
3. Data and Training Adjustments
We fix the item embedding layer using pretrained embeddings and stop gradient updates for the input side, while allowing normal updates for the label‑aware attention side. Negative sampling probabilities are set proportional to item frequency (power‑law) to suppress overly popular items and preserve long‑tail diversity.
Experimental Evaluation
Capsule Initialization Effect
Using real user sequences, we applied the Markov initialization and visualized the results with t‑SNE. In a concentrated‑interest sequence, only 2‑4 capsules were needed, whereas the original length‑based rule would allocate 7‑8 capsules. In a diverse‑interest sequence, the method captured all major interests without over‑splitting.
Business Impact
Deploying the enhanced model in a short‑video feed increased recall share, positive feedback rate, and, most importantly, recall diversity—covering more content categories, resources, and creators per exposure.
Conclusion
We iterated on Alibaba's multi‑interest extraction framework by replacing random capsule initialization with data‑driven max‑min and Markov strategies, simplifying routing, and aligning embeddings with label‑aware attention. The max‑min method is lightweight and suits scenarios with modest interest discrimination, while the Markov approach excels when high coverage and distinctness are required.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
