How X’s Open‑Source “For You” Recommendation Engine Works
X (formerly Twitter) has open‑sourced its “For You” recommendation algorithm, revealing a Grok‑based Transformer that merges on‑platform and off‑platform content, removes manual features, and scores posts through a multi‑stage pipeline with candidate sourcing, hydration, filtering, scoring, and selection.
Algorithm Architecture
The "For You" system combines two sources of posts—on‑platform content from accounts you follow (Thunder) and off‑platform content retrieved by machine‑learning search (Phoenix Retrieval)—and feeds them into a Grok‑based Transformer called Phoenix that predicts interaction probabilities such as likes, replies, retweets, and clicks. The final score is a weighted combination of these predictions.
Human‑friendly version
Pipeline Stages
Query Hydration : Fetch recent user interaction records and metadata such as the follow list.
Candidate Sourcing : Retrieve candidate posts—Thunder provides the latest posts from followed accounts (on‑platform) and Phoenix Retrieval recalls posts from the entire corpus via ML (off‑platform).
Candidate Hydration : Enrich each candidate with core fields (text, media), author information (username, verification status), video length, subscription status, etc.
Pre‑Scoring Filters : Filter out duplicates, expired or self‑posts; posts from blocked/muted accounts; posts containing muted keywords, already seen or recently shown; and paid content that does not meet subscription criteria.
Scoring :
Phoenix Scorer – Transformer predicts interaction probabilities.
Weighted Scorer – Weighted fusion of the predicted probabilities.
Author Diversity Scorer – Down‑weights posts from the same author to ensure diversity.
OON Scorer – Additional adjustment for off‑platform content.
Selection : Rank posts by the final score and take the top K.
Post‑Selection Processing : Final verification filtering removes deleted, spam, violent, or otherwise undesirable content.
Key Design Decisions
Zero Manual Features : Relevance is learned entirely by the Grok‑Transformer from user behavior sequences, dramatically simplifying the data pipeline and online service.
Isolation During Scoring : In the Transformer inference stage, candidate posts do not attend to each other; they only interact with the user context, making each score independent, cacheable, and reproducible.
Hash‑Based Embedding : Both retrieval and ranking use multiple hash functions for embedding lookup, reducing memory usage and computational cost.
Multi‑Action Prediction : The model outputs probabilities for several actions (like, reply, retweet, click) instead of a single relevance score, providing a finer‑grained view of user intent.
Composable Pipeline Architecture : The candidate-pipeline crate offers framework‑level support, decoupling pipeline execution and monitoring from business logic, allowing parallel execution of independent stages, graceful degradation on errors, and plug‑in integration of new sources, fields, filters, or scorers.
https://github.com/xai-org/x-algorithmSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
