Overview of Twitter's Open‑Source Recommendation Algorithm
Twitter has open‑sourced its core recommendation algorithm, detailing its candidate sources, in‑network and out‑of‑network ranking models, graph‑based and embedding methods, and a comprehensive list of components that power the home timeline, with links to the GitHub repositories and engineering blog.
On March 31, 2023, Twitter announced the open‑source release of its core recommendation algorithm, which determines tweet ordering on users' timelines. The repository on GitHub (https://github.com/twitter/the-algorithm) has attracted over 24k stars and 4.2k forks.
The algorithm consists of multiple services and models that build and serve the home timeline. The official engineering blog provides further details.
Candidate Sources
Twitter uses several candidate sources to retrieve up to 1500 tweets per request, mixing in‑network (followed users) and out‑of‑network (non‑followed users) tweets, roughly 50 % each.
In‑Network Sources
The in‑network source is the largest candidate source, ranking tweets from followed users using a logistic‑regression model and a Real Graph model that predicts interaction likelihood between a user and a tweet author.
Out‑of‑Network Sources
Two approaches are used: a social graph traversal (implemented by GraphJet) that answers questions such as “which tweets have my followees interacted with?” and an embedding‑based method that computes similarity in a latent space. The SimClusters embedding groups users and tweets into 145 000 communities.
The table below lists the main components included in the repository:
Type
Component
Description
Feature
SimClusters
Community detection and sparse embeddings into these communities.
Feature
TwHIN
Dense knowledge‑graph embeddings for users and tweets.
Feature
trust-and-safety-models
Models for detecting NSFW or abusive content.
Feature
real-graph
Predicts the probability of interaction between users.
Feature
tweepcred
PageRank‑style algorithm for computing user reputation.
Feature
recos-injector
Stream event processor for building input streams for GraphJet‑based services.
Feature
graph-feature-service
Provides graph features for a pair of directed users.
Data Source
search-index
Finds and ranks tweets inside the network, accounting for ~50 % of candidates.
Data Source
cr-mixer
Coordination layer that extracts out‑of‑network tweet candidates.
Data Source
user-tweet-entity-graph
Maintains an in‑memory user‑to‑tweet interaction graph for candidate retrieval.
Data Source
follow-recommendation-service
Suggests accounts and tweets for users to follow.
Ranking
light-ranker
Lightweight model used by search index for tweet ranking.
Ranking
heavy-ranker
Neural network ranking candidate tweets, a primary signal for timeline selection.
Mixing & Filtering
home-mixer
Main service that builds and serves the home timeline.
Mixing & Filtering
visibility-filters
Filters content for legal compliance, product quality, and revenue protection.
Mixing & Filtering
timelineranker
Traditional service providing relevance scores from early search index and UTEG services.
Software Framework
navi
High‑performance ML model service written in Rust.
Software Framework
product-mixer
Framework for building content sources.
Software Framework
twml
Traditional ML framework built on TensorFlow v1.
For more details, refer to the GitHub repositories (the‑algorithm, the‑algorithm‑ml) and the engineering blog post linked above.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.