Artificial Intelligence 10 min read

Overview of Twitter's Open‑Source Recommendation Algorithm

Twitter has open‑sourced its core recommendation algorithm, detailing its candidate sources, in‑network and out‑of‑network ranking models, graph‑based and embedding methods, and a comprehensive list of components that power the home timeline, with links to the GitHub repositories and engineering blog.

Architecture Digest

Jun 2, 2023

Overview of Twitter's Open‑Source Recommendation Algorithm

On March 31, 2023, Twitter announced the open‑source release of its core recommendation algorithm, which determines tweet ordering on users' timelines. The repository on GitHub (https://github.com/twitter/the-algorithm) has attracted over 24k stars and 4.2k forks.

The algorithm consists of multiple services and models that build and serve the home timeline. The official engineering blog provides further details.

Candidate Sources

Twitter uses several candidate sources to retrieve up to 1500 tweets per request, mixing in‑network (followed users) and out‑of‑network (non‑followed users) tweets, roughly 50 % each.

In‑Network Sources

The in‑network source is the largest candidate source, ranking tweets from followed users using a logistic‑regression model and a Real Graph model that predicts interaction likelihood between a user and a tweet author.

Out‑of‑Network Sources

Two approaches are used: a social graph traversal (implemented by GraphJet) that answers questions such as “which tweets have my followees interacted with?” and an embedding‑based method that computes similarity in a latent space. The SimClusters embedding groups users and tweets into 145 000 communities.

The table below lists the main components included in the repository:

Type

Component

Description

Feature

SimClusters

Community detection and sparse embeddings into these communities.

Feature

TwHIN

Dense knowledge‑graph embeddings for users and tweets.

Feature

trust-and-safety-models

Models for detecting NSFW or abusive content.

Feature

real-graph

Predicts the probability of interaction between users.

Feature

tweepcred

PageRank‑style algorithm for computing user reputation.

Feature

recos-injector

Stream event processor for building input streams for GraphJet‑based services.

Feature

graph-feature-service

Provides graph features for a pair of directed users.

Data Source

search-index

Finds and ranks tweets inside the network, accounting for ~50 % of candidates.

Data Source

cr-mixer

Coordination layer that extracts out‑of‑network tweet candidates.

Data Source

user-tweet-entity-graph

Maintains an in‑memory user‑to‑tweet interaction graph for candidate retrieval.

Data Source

follow-recommendation-service

Suggests accounts and tweets for users to follow.

Ranking

light-ranker

Lightweight model used by search index for tweet ranking.

Ranking

heavy-ranker

Neural network ranking candidate tweets, a primary signal for timeline selection.

Mixing & Filtering

home-mixer

Main service that builds and serves the home timeline.

Mixing & Filtering

visibility-filters

Filters content for legal compliance, product quality, and revenue protection.

Mixing & Filtering

timelineranker

Traditional service providing relevance scores from early search index and UTEG services.

Software Framework

navi

High‑performance ML model service written in Rust.

Software Framework

product-mixer

Framework for building content sources.

Software Framework

twml

Traditional ML framework built on TensorFlow v1.

For more details, refer to the GitHub repositories (the‑algorithm, the‑algorithm‑ml) and the engineering blog post linked above.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

artificial-intelligence machine learning Recommendation Algorithm open source Twitter Social Media

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.