Inside Twitter’s Open‑Source Recommendation Engine: Architecture & Key Components
This article examines the open‑source Twitter recommendation algorithm released by Elon Musk, detailing its main services, machine‑learning models, data sources, programming languages, and the GitHub repositories that host the core components such as SimClusters, TwHIN, rankers, and the Rust‑based navi framework.
1. Introduction
Elon Musk promised on Twitter to open‑source the core recommendation algorithm and fulfilled this promise on March 31, 2023. The released code includes the algorithm that recommends tweets in users' timelines, with two new repositories: the‑algorithm and the‑algorithm‑ml.
2. Algorithm Architecture
Twitter's recommendation algorithm is a collection of services and jobs that build and serve the home timeline. The diagram below shows the main connections between these services and jobs.
The main components included in this repository are:
SimClusters : community detection and sparse embeddings into these communities.
TwHIN : dense knowledge‑graph embeddings for users and tweets.
trust-and-safety-models : models for detecting NSFW or abusive content.
real-graph : model predicting the likelihood of interaction between Twitter users.
tweepcred : PageRank‑like algorithm for calculating user reputation.
recos-injector : stream event processor that builds input streams for GraphJet‑based services.
graph-feature-service : provides graph features for a pair of directed users (e.g., how many users A follow tweets from user B).
search-index : searches the network for tweets and ranks them; about 50% of tweets come from this candidate source.
cr-mixer : coordination layer that extracts out‑of‑network tweet candidates from foundational compute services.
user-tweet-entity-graph : maintains an in‑memory user‑to‑tweet interaction graph and finds candidates by traversing it, built on the GraphJet framework.
follow-recommendation-service : suggests accounts for users to follow and tweets from those accounts.
light-ranker : lightweight ranking model used by search to rank tweets.
heavy-ranker : neural network that ranks candidate tweets, one of the main signals for timeline selection.
home-mixer : primary service that builds and serves the home timeline.
visibility-filters : filters Twitter content to support legal compliance, improve product quality, increase user trust, and protect revenue through hard filters, safe‑product handling, and coarse‑grained degradation.
timelineranker : traditional service providing relevance scores for tweets from earlybird search index and UTEG service.
navi : high‑performance machine‑learning model service written in Rust.
product-mixer : software framework for building content sources.
twml : traditional machine‑learning framework built on TensorFlow v1.
3. Programming Languages
The codebase includes the following programming languages:
🥇 Scala – a JVM language
🥈 Java – essential
🥉 Starlark – a Python dialect that fixes many Python shortcomings
Additionally Python, C++, and Rust.
4. GitHub Repositories
Algorithm main repository: https://github.com/twitter/the-algorithm/
ML model repository: https://github.com/twitter/the-algorithm-ml
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.