Inside X’s Open‑Source ‘For You’ Algorithm: How AI Drives Your Attention
The article dissects X’s newly open‑sourced ‘For You’ feed algorithm, detailing its Rust and Python implementation, the Home Mixer pipeline, candidate sourcing, Grok‑based scoring, and extensive filtering, showing how machine‑learning predicts user interactions and shapes the content you see.
Background: Why X Open‑sourced the Algorithm
X’s “For You” feed mixes in‑network posts with out‑of‑network content and ranks them using AI. The company released the entire production system on GitHub (https://github.com/xai-org/x-algorithm) under an Apache 2.0 license, with 62.9% of the code in Rust and 37.1% in Python, updating every four weeks. The stated goal is transparency: developers can study a large‑scale recommender and contribute code, replacing the previous black‑box perception.
How the Algorithm Works Step‑by‑Step
The pipeline follows a Home Mixer‑controlled Candidate Pipeline framework. Each stage performs a single function and can run in parallel, improving efficiency.
Query Hydration : Load the user’s recent interactions, follow list, and preferences; the code uses a hydrator to fetch serialized data such as recent likes and replies.
Candidate Sources : Pull posts from the Thunder store (friends’ posts) and from Phoenix (global candidates).
Hydration : Enrich candidates with metadata—post content, author info, media assets—to ensure completeness.
Filtering : Remove duplicates, stale posts, and blocked content using a series of filter classes.
Scoring : The Grok transformer predicts interaction probabilities for each candidate and produces a weighted total score.
Selection : Rank candidates by the total score and select the top K for the feed.
Post‑Selection Filtering : Apply final checks to drop spam or low‑quality items.
Key Components: Thunder and Phoenix
Thunder is an in‑memory store located in the thunder/ directory. It consumes Kafka streams of post creation and deletion events, partitions data by user, and serves recent friend posts in milliseconds without hitting a database.
Phoenix resides in the phoenix/ directory and handles both retrieval and ranking. Retrieval uses a two‑tower model: a user tower encodes the user’s history and features, while a candidate tower encodes all posts; the dot‑product similarity selects the top candidates. Ranking employs a Grok‑1‑derived transformer that takes user context and candidate posts, masks attention to keep candidates independent, and outputs probabilities for actions such as like, reply, and share.
Scoring and Filtering Logic
The scoring stage runs the Phoenix scorer transformer, which emits probabilities for multiple actions. Positive actions (e.g., Favorite, Reply, Repost, Click) add to the score, while negative actions (e.g., Not Interested, Block Author) subtract. A weighted scorer aggregates these using learned weights from data.
Favorite – positive – likes probability
Reply – positive – reply probability
Repost – positive – repost probability
Quote – positive – quote probability
Click – positive – click probability
Profile Click – positive – author‑page click probability
Video View – positive – video‑view probability
Photo Expand – positive – photo‑expand probability
Share – positive – share probability
Dwell – positive – dwell time probability
Follow Author – positive – follow‑author probability
Not Interested – negative – negative signal
Block Author – negative – negative signal
Mute Author – negative – negative signal
Report – negative – negative signal
Filtering uses several dedicated filters, each implemented as a class:
DropDuplicates – removes duplicate IDs
AgeFilter – discards old posts to keep the feed fresh
SelfpostFilter – excludes posts authored by the user
MutedKeyword – respects user‑specified muted keywords
AuthorSocialgraph – blocks authors on the user’s blacklist
Additional post‑filtering stages such as VF Filter and DedupConversation further guard against spam and thread‑collapse issues.
Implications of the Open Source Release
For end users, the feed becomes more personalized because the code is transparent and the community can address bias. For developers, the pipeline can be reused to build new applications, and the Rust + ML stack serves as a practical example of high‑performance recommender engineering. Long‑term, the open‑source model promotes AI transparency and positions the Grok‑based recommender as a reference implementation. Risks include potential abuse for manipulation, but the authors argue that the benefits outweigh the drawbacks, especially with a four‑week update cadence that keeps the system current.
Conclusion
By examining the source code, the “mystery” behind X’s feed disappears: a modular pipeline, a transformer that learns user behavior, and a series of strict filters together produce the personalized experience. Readers are encouraged to explore the GitHub repository, experiment with modifications, and anticipate even smarter recommendations in the future.
ShiZhen AI
Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
