How DeepMatch Boosts Music Recommendations with Play Rate and Intent Signals

This article examines the DeepMatch retrieval model for Tmall Genie music recommendation, detailing how incorporating user feedback such as play‑rate and query intent signals via multi‑task learning and feedback‑aware self‑attention improves recall accuracy and reduces negative recommendations, while also discussing embedding factorization, loss functions, and distributed training optimizations.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How DeepMatch Boosts Music Recommendations with Play Rate and Intent Signals

Background

Traditional recommendation systems consist of two stages: Candidate Generation (matching) and Ranking. The classic YouTube video recommendation pipeline first quickly selects a few hundred candidates from the whole catalog, then scores and re‑orders them for final display.

YouTube recommendation pipeline
YouTube recommendation pipeline

DeepMatch Overview

The paper focuses on the matching (recall) stage, which must retain as many relevant items as possible while remaining fast. Recent practice shows that behavior‑sequence based deep learning models combined with high‑performance approximate nearest‑neighbor search (the DeepMatch solution) achieve both accuracy and speed, outperforming traditional methods such as swing, etrec, and SVD.

Models deeper non‑linear user‑item relationships.

Allows incorporation of diverse user and item features.

Behavior‑sequence models capture evolving short‑ and long‑term interests.

Missing Feedback Signals

In the "I want to listen to a song" scenario on Tmall Genie, two important signals are not used by the baseline model:

Negative feedback (Play Rate)

Play completion rate (value in [0,1]) reflects user satisfaction. Low completion indicates a negative signal (e.g., user skips the song). The baseline training data only contains high‑completion (positive) examples.

Song request intent (Intent Type)

Queries can be explicit (exact song name) or implicit (style recommendation). Different intent types contribute differently to the recommendation model; incorporating an intent attention mechanism can better capture user preferences.

Method

The overall architecture follows a Self‑Attention (Transformer) encoder that independently encodes each step of the user behavior sequence, then computes inner products with target item vectors.

DeepMatch model architecture
DeepMatch model architecture

Input Representations

Each item in the history sequence is represented by four embeddings:

Item Embedding : shared between the two training sets.

Position Embedding : learned embeddings for each position (instead of sinusoidal).

Play Rate Embedding : continuous play‑rate values are projected into the same low‑dimensional space as item embeddings; for the original training set without play‑rate, a constant value of 0.99 is used.

Intent Type Embedding : categorical intent (e.g., explicit request, style recommendation) is mapped to a fixed low‑dimensional vector; missing intent in the original set is assumed to be explicit request.

Factorized Embedding Parameterization

Following ALBERT, the large item vocabulary is first projected to a low‑dimensional space and then back to the hidden size, dramatically reducing parameter count while preserving model capacity.

Feedback‑Aware Multi‑Head Self‑Attention

External signals (Play Rate and Intent Type) are injected into the attention computation, enabling the model to weigh items according to user feedback and intent.

Loss Functions

Two tasks are trained jointly:

Positive Feedback : Sampled Softmax Loss encourages high scores for items with high play‑rate.

Negative Feedback : Sigmoid Cross‑Entropy Loss pushes items with low play‑rate to low scores.

The total loss is the sum of both components.

Positive vs. Negative feedback loss comparison
Positive vs. Negative feedback loss comparison

Experiments

Distributed Training

TensorFlow ParameterServer strategy is used. Key practical tips include consistent embedding partitioning, de‑duplicating sampled items within a batch, and flexible mask mechanisms for custom attention.

Results

Offline metrics use Recall@N (POS@N for positives, NEG@N for negatives). Three configurations were compared:

Baseline DeepMatch (only high‑play‑rate positives).

Baseline + Play Rate + Intent Type.

Configuration b plus Negative Feedback multitask learning.

Offline experiment results
Offline experiment results

Adding feedback signals (Play Rate) and intent improves POS@N, and incorporating Negative Feedback further reduces NEG@N with minimal impact on POS@N. Online A/B tests in the "Guess You Like" scenario show a 9.2% increase in average playback duration, and the approach has been deployed to additional Tmall Genie recommendation scenarios.

References

Covington et al., 2016. Deep neural networks for YouTube recommendations.

Hidasi & Karatzoglou, 2018. Recurrent neural networks with top‑k gains for session‑based recommendations.

Sun et al., 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer.

Li et al., 2019. Multi‑interest network with dynamic routing for recommendation at Tmall.

Johnson et al., 2019. Billion‑scale similarity search with GPUs.

Paudel et al., 2018. Loss Aversion in Recommender Systems.

Zhao et al., 2018. Recommendations with negative feedback via pairwise deep reinforcement learning.

Vaswani et al., 2017. Attention is all you need.

Song et al., 2019. AutoInt: Automatic feature interaction learning via self‑attentive neural networks.

Devlin et al., 2018. BERT: Pre‑training of deep bidirectional transformers for language understanding.

Lan et al., 2019. ALBERT: A lite BERT for self‑supervised learning of language representations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep LearningRecommendation SystemsSelf-Attentionuser feedbackmultitask learning
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.