Artificial Intelligence 8 min read

How FilterLLM Turns One LLM Pass into Billion‑User Cold‑Start Recommendations

The article analyzes the FilterLLM approach, which augments a frozen LLM with billions of learnable user tokens to predict a full‑user interaction probability distribution in a single forward pass, dramatically speeding up cold‑start recommendation while preserving recommendation quality across multiple benchmarks.

Data Party THU

Aug 14, 2025

How FilterLLM Turns One LLM Pass into Billion‑User Cold‑Start Recommendations

Innovation Highlights

Eliminates pairwise user‑item inference by predicting the interaction probability distribution for all users in a single LLM forward pass, addressing the computational bottleneck of billion‑scale cold‑start recommendation.

Extends the LLM’s next‑word prediction to a “user‑distribution prediction” task, reducing complexity from linear in the candidate set to constant time.

Introduces a learnable “user vocabulary” (V_user) by adding each of the one‑billion user IDs as tokens to the LLM’s embedding matrix, enabling simultaneous modeling of the full user set.

Method

The FilterLLM framework appends a token for every user ID to a pretrained LLM’s vocabulary. During training only the new distribution head and the user tokens are fine‑tuned (via LoRA); the original LLM weights remain frozen. The workflow consists of two stages:

Pre‑training of user/item embeddings: User and item latent vectors are learned on the interaction graph with a Bayesian Personalized Ranking (BPR) loss. These vectors initialize the user tokens in the LLM.

Fine‑tuning with distribution loss: The model is trained with a log‑softmax loss that maximizes the probability of true interacting users, supplemented by a behavior‑guided loss that aligns LLM representations with collaborative‑filtering item vectors.

At inference, a cold‑start item’s textual description is fed to the frozen LLM once. The added distribution head outputs a softmax over all user tokens, yielding a probability for each user. The top‑K users are sampled as pseudo‑interactions, which are combined with real interactions to update the item’s embedding before it is sent to the downstream recommender.

Text‑to‑Judgment vs. Text‑to‑Distribution

Traditional “Text‑to‑Judgment” constructs a separate prompt for each user‑item pair, requiring hundreds of LLM calls to cover candidate users. The proposed “Text‑to‑Distribution” paradigm processes the item description a single time and directly produces a probability distribution over the entire user set, eliminating candidate‑set enumeration.

FilterLLM Architecture

The item text is encoded by a frozen LLM to obtain the final hidden representation. This representation, together with the billion user tokens, is fed to the added distribution head, which computes a softmax over all users. The resulting probabilities are used to sample pseudo‑interactions; together with real interactions they update the cold‑item embedding via an optimization module.

Online Service Flow

In production, each new cold item triggers a single forward pass that generates a full‑user probability distribution in about 236 ms. The system samples the top‑K users, updates only the cold item’s embedding, and pushes the updated embedding to the online recommender. This pipeline achieves a 34.75× inference speedup while keeping the 120 ms embedding‑update latency comparable to prior methods.

Experimental Evaluation

Experiments on CiteULike and ML‑10M compare FilterLLM against twelve strong baselines across three backbone models (MF, NGCF, LightGCN). On MF with CiteULike, FilterLLM achieves Recall@20 = 0.2128 and NDCG@20 = 0.1700, improving over the best baseline ColdLLM by ~16 % and ~21 % respectively; cold‑start Recall rises from 0.1195 to 0.1787. Similar gains appear on ML‑10M. With graph backbones (NGCF, LightGCN) the advantage widens—for LightGCN on CiteULike, cold‑start Recall increases from 0.1035 (ColdLLM) to 0.1604, and NDCG from 0.0828 to 0.1221. Warm‑start metrics remain on par or slightly better, indicating no loss in recommendation quality for existing items. These results confirm the general superiority of the Text‑to‑Distribution paradigm across model families.

Code example

来源：学姐带你玩AI
本文
约1600字
，建议阅读
5
分钟
本文提出FilterLLM，将冷启动物品的文本描述一次性映射成覆盖十亿用户的交互概率分布。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM large language models Recommendation Systems cold-start FilterLLM user distribution prediction

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.