Artificial Intelligence 14 min read

How NoteLLM Boosts Cold‑Start Recommendation with Generative Contrastive Learning

This article reviews the NoteLLM paper, which leverages Llama 2 to create richer text embeddings and automatically generate tags and categories for note recommendation, addressing cold‑start issues through a multitask prompt design, generative‑contrastive learning, and collaborative supervised fine‑tuning, and demonstrates strong offline and online gains.

NewBeeNLP

May 24, 2024

How NoteLLM Boosts Cold‑Start Recommendation with Generative Contrastive Learning

Background

To mitigate cold‑start problems in recommendation, the i2i (item‑to‑item) recall stage often adds a content‑multimodal retrieval path that relies solely on item content, allowing new items to compete fairly with older ones. Existing multimodal i2i methods typically use BERT‑derived embeddings, which suffer from limited representation capacity and a mismatch between pure semantic embeddings and downstream click‑through objectives.

BERT representation limitations : Larger LLMs can capture longer‑tail information and produce embeddings more aligned with recommendation goals.

Insufficient use of tag and category information : Tags and categories convey a note’s core idea; treating them merely as content fragments wastes valuable signals.

The paper therefore proposes NoteLLM , a multitask LLM (Llama 2) that generates embeddings tailored for recommendation while simultaneously producing tags and categories.

Method Overview

NoteLLM consists of three parts:

Note compression prompt construction

Generative‑Contrastive Learning (GCL)

Collaborative Supervised Fine‑Tuning (CSFT)

The compression prompt defines the model’s input; GCL introduces collaborative‑filtering signals into contrastive learning; CSFT adds a tag/category generation task that reinforces the embedding quality.

Note Compression Prompt

A unified prompt is built for each note so that the model can output a special token [EMB] whose hidden vector, after a linear projection, becomes the note’s embedding.

Prompt: [BOS]<Instruction> <Input Note> The compression word is:"[EMB]". <Output Guidance> <Output>[EOS]

Separate prompts are used for category and tag generation. Example category prompt:

<Instruction>: Extract the note information in json format, compress it into one word for recommendation, and generate the category of the note. <Input Note>: {'title': , 'topic': , 'content': }. <Output Guidance>: The category is: <Output>:

Example tag prompt (generates j topics):

<Instruction>: Extract the note information in json format, compress it into one word for recommendation, and generate <j> topics of the note. <Input Note>: {'title': , 'content': }. <Output Guidance>: The <j> topics are: <Output>:

During training, the hidden vector of [EMB] is used as the note’s textual representation, while the same prompt framework also yields tags and categories.

Generative‑Contrastive Learning (GCL)

Because LLM pre‑training focuses on semantic understanding rather than click‑through optimization, GCL injects collaborative‑filtering signals into contrastive learning. For each user, the co‑occurrence count of clicking note A then note B within a one‑week window is computed, yielding a co‑occurrence score that down‑weights overly active users.

Positive pairs are high‑score note pairs; negatives are sampled within the batch. The loss uses Info‑NCE with cosine similarity:

InfoNCE = -log \frac{\exp(sim(z_i, z_j)/\tau)}{\sum_{k}\exp(sim(z_i, z_k)/\tau)}

The resulting embeddings encode both content semantics and user behavior signals, improving downstream recommendation.

Collaborative Supervised Fine‑Tuning (CSFT)

CSFT adds a tag/category generation task to the training objective. Two reasons are given:

LLMs can do more than produce a single embedding; generated tags/categories can fill missing metadata.

The task shares the same summarization nature as embedding generation, thus enhancing embedding quality.

During each batch, r notes are selected for tag generation while the rest are used for category generation. The combined loss is:

Loss = \alpha \cdot L_{GCL} + (1-\alpha) \cdot L_{CSFT}

where \alpha balances the two objectives.

Experiments

Offline Evaluation

NoteLLM is compared against a SentenceBERT baseline and several LLM‑based embedding methods using Recall@k. Results show NoteLLM consistently outperforms all baselines (NoteLLM ≥ fine‑tuned LLM > fine‑tuned BERT > zero‑shot LLM). Performance gains are stable across different exposure levels, with especially high recall for low‑exposure (cold‑start) notes.

Ablation Study

Removing either the CSFT or GCL task degrades performance; the GCL component proves most critical, while tag/category generation provides modest additional benefit.

Online A/B Test

In a one‑week live experiment, NoteLLM increased click‑through rate by 16.20%, comment count by 1.10%, and weekly active publishers (WAP) by 0.41% compared to the SentenceBERT baseline. Daily comments on new notes rose 3.58%, confirming the model’s effectiveness for cold‑start scenarios. NoteLLM has been fully deployed.

Conclusion

NoteLLM demonstrates a practical, easy‑to‑deploy approach for generating recommendation‑ready text embeddings and auxiliary tags/categories using a large language model. The multitask prompt design and incorporation of collaborative‑filtering signals make it a valuable reference for real‑world recommendation systems.

LLM Embedding Recommendation Systems cold start multitask learning Generative Contrastive Learning

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.