How the NYT Revamped Its Recommendation Engine with Collaborative Topic Modeling

This article explains how the New York Times redesigned its "Recommended for You" system by combining content‑based filtering, collaborative filtering, and a collaborative topic‑modeling approach that uses LDA, reader‑signal adjustments, and fast preference calculations to deliver personalized article suggestions.

21CTO
21CTO
21CTO
How the NYT Revamped Its Recommendation Engine with Collaborative Topic Modeling

History

News recommendation must handle fresh content that many readers have not yet seen, so the system relies on article metadata such as topics, authors, channels, and keyword tags. The first recommendation system used these tags together with a reader’s 30‑day reading history to find similar articles.

Content‑based filtering works well when tags are informative, but rare tags can receive excessive weight, leading to occasional mismatches (e.g., a reader interested in LGBTQ news being shown wedding articles because of a low‑frequency "wedding" tag).

Collaborative filtering mitigates some of these issues by recommending articles read by users with similar reading histories. However, it struggles with newly published articles that no user has read yet.

Current Approach

The team combined both techniques into a hybrid algorithm inspired by Collaborative Topic Modeling (CTM). The algorithm consists of four parts:

Content modeling.

Adjusting the model based on signals from readers.

Modeling reader preferences.

Generating recommendations from the joint features of content and preferences.

Overview

The first step transforms each article into a set of latent topics using Latent Dirichlet Allocation (LDA). Topics are unobserved themes such as "politics" or "environment" that influence the observable words in an article.

Readers are modeled by their topic preferences, and articles are recommended based on the similarity between article topics and reader preferences.

Example Assuming all articles from the previous month fall into two topics—"politics" and "art"—the algorithm tags each article with a probability distribution over these topics. An article about Iraq is marked 100% politics, while a film review is marked 100% art. Mixed articles receive blended scores (e.g., 50% politics, 50% art). The following diagram illustrates the topic space: When a reader spends 60% of their time on art and 40% on politics, their position in the space (shown as a red X) lies close to articles matching their interests, even if they have not read those articles yet.

Algorithm Components

1. Modeling Article Text

The algorithm first applies LDA to each article’s text, learning a distribution over topics based on word frequencies. LDA can be run online, allowing real‑time topic inference for newly published articles.

2. Updating the Model with Reading Patterns

Pure LDA ignores context, so the team adds an offset derived from readers’ click behavior. This creates a hybrid model that blends content topics with collaborative signals. Two offset‑calculation methods were tested: the CTM model and Collaborative Poisson Factorization (CPF). In A/B tests, CTM performed better.

3. Describing Readers

A simple method averages the topic vectors of all articles a reader has clicked, yielding a point in the topic space. To handle noisy clicks, a compromise approach weights clicks by confidence (e.g., 90% liked vs. 10% not liked), producing a more robust reader vector.

This method reduces cold‑start noise and surfaces surprising yet relevant articles.

Optimizations allow the system to compute reader preferences in under a millisecond, enabling real‑time recommendations for all registered users.

Conclusion

By modeling article content and reader preferences as topics and adjusting recommendations with observed reading patterns, the New York Times has rebuilt its recommendation engine using state‑of‑the‑art collaborative topic modeling, achieving significant performance gains over previous algorithms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

personalizationcollaborative filteringRecommendation Systemstopic modelingLDA
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.