Inside X’s New For‑You Recommendation Pipeline: What Creators Must Know

The May 15 open‑source release of X’s For‑You recommendation system reveals a full pipeline—from query hydration and candidate sourcing to multi‑stage scoring—showing that the platform predicts a range of user actions, emphasizes content‑level signals, and offers creators concrete guidance to improve visibility.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Inside X’s New For‑You Recommendation Pipeline: What Creators Must Know

On May 15, X (formerly Twitter) open‑sourced the latest version of its For‑You recommendation system, providing a detailed look at the end‑to‑end pipeline that powers content discovery.

Key signals

The biggest takeaway is that the system no longer optimizes for the highest‑click‑through title; instead it predicts a suite of user actions after viewing a post, such as opening, staying, liking, replying, sharing, following the author, or marking as not interested.

What was updated (seven points)

Added phoenix/run_pipeline.py, merging run_ranker.py and run_retrieval.py into a single entry point that runs retrieval → ranking.

Provided a pre‑trained mini‑Phoenix artifact (≈3 GB) via Git LFS for demo inference without training.

Introduced grox/, a content‑understanding service with classifiers, embeddings, and a task‑execution engine for spam detection, post categorization, policy checks, etc.

Added home-mixer/ads/, an ad‑mixing system that inserts ads while respecting brand‑safety constraints.

Home Mixer now includes a query hydrator that enriches user context (followed topics, starter packs, impression bloom filter, IP, mutual follows, served history).

Added a candidate hydrator that adds context to candidate items (interaction counts, brand‑safety signals, language, media, quote expansion, mutual‑follow Jaccard, topic‑filter results).

Expanded candidate sources (ads, who‑to‑follow, Phoenix MoE, Phoenix topics, prompts, cached posts) and updated Thunder and Phoenix sources.

For‑You as a production line

The README describes the For‑You feed as a pipeline composed of Home Mixer, Thunder, Phoenix, and the Candidate Pipeline. The simplified flow is:

用户请求
 → Query Hydration (adds user context)
 → Candidate Sources (finds candidates)
 → Candidate Hydration (adds item context)
 → Filters (removes unsuitable items)
 → Phoenix Scorer (predicts multiple action probabilities)
 → Ranking Scorer (aggregates scores)
 → Selector (picks Top K)
 → Post‑Selection Filters (brand safety, deduplication)
 → Returns For‑You Feed

This structure shows creators that a post first enters a candidate pool before any scoring occurs; the system must first decide which users the content is relevant to.

Phoenix: retrieval then ranking

Phoenix operates in two stages:

Retrieval: a two‑tower model that encodes users and candidates into embeddings and selects top‑K by dot‑product similarity.

Ranking: a finer‑grained model that predicts a weighted sum of many action probabilities.

The ranking scorer combines scores for actions such as favorite, reply, retweet, click, profile click, photo expand, video view, share, dwell, quote, follow author (positive) and not‑interested, block, mute, report, no dwell (negative):

Final Score = Σ(weight_i × P(action_i))

Positive actions:
favorite / reply / retweet / click / profile click
photo expand / video view / share / dwell / quote
follow author

Negative actions:
not interested / block author / mute author / report / not dwelled

This demonstrates that the platform cares about a broad set of post‑click behaviors, not just likes.

Candidate isolation

During transformer inference each candidate’s score is computed independently (no cross‑candidate attention), ensuring stable, cacheable, and reproducible scoring.

Grox: multimodal content understanding

Grox provides several modules: banger_initial_screen.py: initial quality scoring. spam.py: spam comment detection. safety_ptos.py: policy and safety classification. reply_ranking.py: reply‑quality scoring. multimodal_post_embedder_v5.py: encodes text and images together. PlanMaster: runs multiple content‑understanding plans in parallel.

Key output fields such as quality_score, tags, taxonomy_categories, and slop_score illustrate that the system evaluates content quality, topic, tagging clarity, visual relevance, AI‑generated feel, safety risks, and age‑appropriateness.

First image as algorithm entry

The MultimodalPostEmbedderV5 renders a post’s text and images before embedding, meaning the lead image must help the system understand the topic, persuade users to click, and establish credibility—not merely serve as decoration.

Ad mixing and brand safety

The new home-mixer/ads/ module inserts ads while checking surrounding content for brand‑safety risks. It defines verdicts such as Safe, LowRisk, and MediumRisk, and separates risky posts from ads.

Hydrators: massive context augmentation

Query hydrator enriches user‑side signals (behavior sequence, follows, mutes, topics, starter packs, IP, mutual follows, served history, impression bloom filter). Candidate hydrator adds item‑side signals (interaction counts, author info, language, media presence, video length, quote expansion, brand‑safety flags, mutual‑follow Jaccard, topic‑filter results).

Candidate sources

New sources include Phoenix Source, Phoenix Topics Source, Phoenix MoE Source, Thunder Source, Ads Source, Who‑to‑Follow Source, Prompts Source, and Cached Posts Source. For creators, Phoenix Topics and Phoenix MoE are especially important because they affect how the system matches content to user interests.

Practical checklist for visual creators

Recall clarity: ensure title, body, cover image, tags, and account positioning consistently convey the same topic.

First‑screen appeal: craft a concise headline that explains why the post is worth clicking; use structural or comparison images instead of generic scenery.

Stay value: provide dense information in the body so readers keep reading.

Interaction triggers: embed clear viewpoints, debate edges, and relatable scenarios to encourage comments.

Share & save reasons: include methodologies, checklists, flowcharts, case studies, or anti‑pitfall guides.

Follow conversion: signal a consistent long‑term value proposition at the end to motivate follows.

Negative‑feedback control: avoid click‑bait, exaggerated claims, low‑quality AI‑generated text, repetitive spam, or controversial provocations that increase “not interested” rates.

Conclusion

The May 15 update exposes the core components of a modern content recommendation system: retrieval before ranking, multi‑action scoring, multimodal content understanding, brand‑safety‑aware ad mixing, extensive context hydration, and diverse candidate sources. For visual creators, the real lesson is to adopt a recommendation‑system mindset—treat each post as a set of signals for both the platform and the audience, ensuring clear, consistent, and high‑quality metadata to improve discoverability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningrecommendationrankingcontent creationPhoenixXGrox
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.