Artificial Intelligence 21 min read

Solving Cold‑Start in Recommender Systems: The DropoutNet Approach

This article explains why cold‑start is a critical challenge for recommender systems, outlines four practical strategies—generalization, fast data collection, transfer learning, and few‑shot learning—and then details the DropoutNet model, its end‑to‑end training, loss functions, negative‑sampling techniques, and open‑source implementation.

DataFunTalk

Apr 21, 2022

Solving Cold‑Start in Recommender Systems: The DropoutNet Approach

Recommendation systems typically rely on collaborative filtering, matrix factorization, or deep learning models that need abundant user‑item interaction data. New users and items suffer from the cold‑start problem because they lack sufficient historical behavior, leading to poor exposure and inaccurate modeling.

Timely recommendation of new items is crucial for many platforms: news portals need fresh exposure, UGC platforms must surface new creator content, and dating apps must attract new users to stay active.

Four practical solutions to cold‑start ("泛、快、迁、少"):

1. Generalize (泛): Map new items to broader concepts (e.g., from a specific product to its category, from a short video to its author, or from an article to its topic) and use content‑based recommendation. Multiple upward concepts (brand, style, color, etc.) can be combined, and embeddings derived from content or multimodal signals can supplement missing interaction data.

2. Fast (快): Accelerate the collection of interaction signals for new items, e.g., by updating models in minutes or seconds using real‑time pipelines and contextual‑bandit algorithms.

3. Transfer (迁): Apply transfer learning to leverage data from related domains or regions, fine‑tuning models on the limited data of the new scenario while ensuring domain relevance.

4. Few‑shot (少): Use few‑shot or meta‑learning techniques to train models that can adapt quickly with only a few labeled examples.

The article focuses on a generalized method: the DropoutNet model, which addresses cold‑start by learning user and item embeddings directly from interaction data in an end‑to‑end fashion, eliminating the need for pre‑computed embeddings.

DropoutNet uses a dual‑tower architecture (user tower and item tower). During training, a portion of the users' and items' preference‑statistics features are randomly dropped (input dropout), while content features remain intact. This forces the model to reconstruct the original user‑item similarity even when some features are missing, similar to a denoising auto‑encoder.

The loss function combines a pointwise binary cross‑entropy with two additional terms: a rank loss that improves AUC and a Support‑Vector‑Guided Softmax loss that introduces a margin and negative‑mining to enhance robustness, especially under few‑sample conditions.

Key equations (shown as images in the original) define the cosine similarity between user and positive item, similarities with negative samples, softmax conversion, and the final negative‑log‑likelihood.

Negative sampling is performed online within each mini‑batch by rolling the item embedding matrix to create shifted negative examples, which are then used in the loss computation.

Implementation snippets (TensorFlow) are provided:

def softmax_loss_with_negative_mining(user_emb, item_emb, labels, num_negative_samples=4, embed_normed=False, weights=1.0, gamma=1.0, margin=0, t=1):
    """Compute the softmax loss based on cosine distance with negative mining."""
    # ... (code omitted for brevity) ...
    return loss

def support_vector_guided_softmax_loss(pos_score, neg_scores, margin=0, t=1, smooth=1.0, threshold=0, weights=1.0):
    """Reference: Support Vector Guided Softmax Loss for Face Recognition."""
    # ... (code omitted for brevity) ...
    return loss

Another snippet shows an in‑batch pairwise ranking loss:

def pairwise_loss(labels, logits):
    pairwise_logits = tf.expand_dims(logits, -1) - tf.expand_dims(logits, 0)
    pairwise_mask = tf.greater(tf.expand_dims(labels, -1) - tf.expand_dims(labels, 0), 0)
    pairwise_logits = tf.boolean_mask(pairwise_logits, pairwise_mask)
    pairwise_pseudo_labels = tf.ones_like(pairwise_logits)
    loss = tf.losses.sigmoid_cross_entropy(pairwise_pseudo_labels, pairwise_logits)
    loss = tf.where(tf.is_nan(loss), tf.zeros_like(loss), loss)
    return loss

The DropoutNet model and its variants are open‑sourced in Alibaba's EasyRec framework (github.com/alibaba/EasyRec). Documentation is available at easyrec.readthedocs.io, and the community provides a DingTalk group for discussion.

References include the original DropoutNet NIPS 2017 paper, the Support Vector Guided Softmax loss paper, Facebook’s embedding‑based retrieval work, and classic Learning‑to‑Rank literature.

In summary, the article presents a comprehensive view of cold‑start mitigation strategies, introduces an end‑to‑end DropoutNet solution with novel loss functions and negative‑sampling mechanisms, and offers practical code and open‑source resources for practitioners.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Embedding transfer learning recommender systems cold-start few-shot learning negative sampling DropoutNet

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.