Improving the MIND Multi‑Interest Recommendation Model with Capsule Initialization and Routing Enhancements
This article presents a comprehensive study of the MIND multi‑interest recommendation model, detailing its original architecture, identified shortcomings, and proposed enhancements—including capsule initialization via max‑min and Markov methods, routing simplifications, and training adjustments—along with experimental results and business impact assessments.
The article introduces the business background of short‑video and information‑flow scenarios, where massive user behavior sequences (exposures, plays, clicks, likes, follows) provide valuable signals for recommendation. It reviews existing sequence models such as YouTube DNN, GRU4REC, and the original MIND model, highlighting MIND’s capsule‑based multi‑interest extraction.
MIND, proposed by Alibaba in 2019, directly models multiple user interests using capsule networks inspired by CapsuleNet. Subsequent work (ComiRec) improved the inference stage by merging recall results from multiple interests.
Re‑implementation of MIND on the "墨鱼丸" short‑video platform revealed issues: a strong coupling between capsule count and sequence length, weak differentiation among capsules, and random initialization leading to instability.
To address these problems, the authors propose two capsule‑initialization strategies:
**Max‑min (k‑means++)**: iteratively select the farthest item from the current capsule set, ensuring diverse initial positions.
**Markov‑based method**: construct a transition probability matrix between items, compute the stationary distribution, and select high‑density points as capsule seeds, prioritizing points with many high‑importance neighbors.
Routing modifications include removing the shared bilinear matrix, replacing the squash function with L2‑norm, and sparsifying the logit routing matrix by keeping only the maximum value per column and discarding others. These changes align the routing process with k‑means clustering, where each item contributes to a single capsule.
Training adjustments involve fixing pre‑trained item embeddings for the input side while allowing gradient updates for label‑aware attention on the output side, and applying a power‑law based negative‑sampling strategy to control the influence of popular and cold items.
Experimental results demonstrate that the Markov initialization adapts the number of capsules to user behavior structure, achieving better coverage for both concentrated and diverse interest patterns. Business evaluations show significant improvements in recall diversity, positive feedback rate, and exposure coverage.
The paper concludes that the max‑min approach is simple and effective for scenarios with modest interest discrimination requirements, whereas the Markov method excels when high coverage and distinction of user interests are critical, despite higher computational cost.
References to the original MIND, ComiRec, and capsule network papers are provided, followed by a recruitment notice for the 360 information‑flow algorithm team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
