How We Enhanced the MIND Multi‑Interest Model for Short‑Video Recommendations

This article analyzes the original MIND recommendation model, identifies its limitations in capsule design and routing, proposes three practical improvements—including max‑min and Markov‑based capsule initialization and a sparse routing scheme—and validates the gains with offline experiments and real‑world business metrics.

NewBeeNLP
NewBeeNLP
NewBeeNLP
How We Enhanced the MIND Multi‑Interest Model for Short‑Video Recommendations

Business Background

Short‑video and feed scenarios generate massive user behavior sequences (exposures, plays, clicks, likes, follows), which are valuable for recommendation. Existing sequence models such as YouTube DNN, GRU4REC, and the original MIND model aim to capture user interests, with MIND uniquely modeling multiple interests via capsule networks.

Original MIND Model and Its Evolution

Proposed by Alibaba in 2019, MIND directly models several user interests using capsule routing inspired by CapsuleNet. In 2020, Alibaba and Tsinghua introduced ComiRec, improving the inference stage by merging results from multiple interest‑specific ANN searches to increase recall diversity.

Identified Issues in the Original MIND

Strong coupling between capsule count and user sequence length.

Random initialization of routing logits leads to instability.

High similarity among capsules causes redundant interest capture.

Inconsistent content within each capsule reduces interpretability.

Proposed Improvements

1. Capsule Initialization

We treat capsule initialization as a clustering problem. Using a max‑min strategy similar to k‑means++, we first select a random item as the initial capsule, then iteratively add the item farthest from the current capsule set until the desired number of capsules K (computed from sequence length) is reached.

Original MIND architecture
Original MIND architecture

We also propose a Markov‑based method: construct a similarity‑based transition matrix among items, compute the stationary distribution, and select high‑density points as candidate capsules. The final capsules are chosen by prioritizing candidates with many neighbors while respecting a maximum capsule count.

Markov capsule selection
Markov capsule selection

2. Routing Process Modification

Remove the shared bilinear mapping matrix S, as positional information is deemed irrelevant for multi‑interest modeling.

Replace the original squash function with a simple L2‑norm to normalize capsule vectors.

Transform the dense logit routing matrix into a sparse one by keeping only the maximum logit per item and zero‑ing out the rest; also replace cumulative updates with overwriting each routing iteration.

These changes emulate the hard‑assignment behavior of k‑means, ensuring each item contributes to a single capsule per routing round.

3. Data and Training Adjustments

We fix the item embedding layer using pretrained embeddings and stop gradient updates for the input side, while allowing normal updates for the label‑aware attention side. Negative sampling probabilities are set proportional to item frequency (power‑law) to suppress overly popular items and preserve long‑tail diversity.

Experimental Evaluation

Capsule Initialization Effect

Using real user sequences, we applied the Markov initialization and visualized the results with t‑SNE. In a concentrated‑interest sequence, only 2‑4 capsules were needed, whereas the original length‑based rule would allocate 7‑8 capsules. In a diverse‑interest sequence, the method captured all major interests without over‑splitting.

Improved MIND architecture
Improved MIND architecture

Business Impact

Deploying the enhanced model in a short‑video feed increased recall share, positive feedback rate, and, most importantly, recall diversity—covering more content categories, resources, and creators per exposure.

Conclusion

We iterated on Alibaba's multi‑interest extraction framework by replacing random capsule initialization with data‑driven max‑min and Markov strategies, simplifying routing, and aligning embeddings with label‑aware attention. The max‑min method is lightweight and suits scenarios with modest interest discrimination, while the Markov approach excels when high coverage and distinctness are required.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIuser behavior modelingrouting optimizationcapsule networkMINDmulti-interest recommendation
NewBeeNLP
Written by

NewBeeNLP

Always insightful, always fun

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.