This document helps AI agents work effectively in the X For You Feed Algorithm repository.
This repository contains the core recommendation system powering the "For You" feed on X (Twitter). It combines:
- In-network content from accounts you follow (Thunder)
- Out-of-network content discovered through ML-based retrieval (Phoenix)
Everything is ranked using a Grok-based transformer model that predicts engagement probabilities.
x-algorithm/
├── home-mixer/ # Rust - Orchestration layer (main service)
│ ├── candidate_pipeline/ # Pipeline implementation for the home feed
│ ├── filters/ # Pre/post-selection filters
│ ├── scorers/ # ML scorers and weighted combination
│ ├── candidate_hydrators/ # Enrich candidates with additional data
│ ├── query_hydrators/ # Fetch user context
│ ├── sources/ # Thunder and Phoenix candidate sources
│ ├── selectors/ # Top-K selection logic
│ ├── side_effects/ # Async operations (caching, logging)
│ └── server.rs # gRPC server (ScoredPostsService)
├── thunder/ # Rust - In-memory post store + Kafka ingestion
│ ├── posts/ # Post storage implementation
│ ├── kafka/ # Real-time tweet event listeners
│ └── thunder_service.rs # gRPC server (InNetworkPostsService)
├── candidate-pipeline/ # Rust - Generic pipeline framework (traits)
│ ├── filter.rs # Filter trait
│ ├── hydrator.rs # Hydrator trait
│ ├── scorer.rs # Scorer trait
│ ├── selector.rs # Selector trait
│ ├── source.rs # Source trait
│ └── candidate_pipeline.rs # Pipeline execution framework
└── phoenix/ # Python - ML ranking and retrieval models
├── grok.py # Transformer architecture (ported from Grok-1)
├── recsys_model.py # Ranking model with candidate isolation
├── recsys_retrieval_model.py # Two-tower retrieval model
├── run_ranker.py # Run ranking inference
├── run_retrieval.py # Run retrieval inference
└── test_*.py # Unit tests
Requirements: Python >= 3.11, uv package manager
# Run ranking model
cd phoenix/
uv run run_ranker.py
# Run retrieval model
uv run run_retrieval.py
# Run tests
uv run pytest test_recsys_model.py test_recsys_retrieval_model.pyDependencies (from pyproject.toml):
dm-haiku>=0.0.13- Neural network library for JAXjax==0.8.1- NumPy on acceleratorsnumpy>=1.26.4pytest(dev dependency)
Linting: Ruff is configured but not enforced. Settings in pyproject.toml:
- Line length: 100
- Indent width: 4
Note: This repository contains Rust source files but is missing Cargo.toml files. The code cannot be built without adding proper Cargo configuration. The crate names referenced in imports suggest:
xai_candidate_pipeline→ candidate-pipeline/xai_home_mixer_proto→ external protobuf definitionsxai_thunder_proto→ external protobuf definitions
If adding Cargo.toml files, use a workspace structure with the three crates.
The CandidatePipeline trait defines a staged execution model:
Query → QueryHydrators → Sources → Hydrators → Filters → Scorers → Selector → PostSelectionFilters → SideEffects → Result
| Component | Purpose | Execution |
|---|---|---|
QueryHydrator |
Fetch user context (engagement history, following list) | Parallel |
Source |
Retrieve candidates from data stores | Parallel |
Hydrator |
Enrich candidates with additional data | Parallel |
Filter |
Remove ineligible candidates | Sequential |
Scorer |
Compute scores (must preserve candidate count/order) | Sequential |
Selector |
Sort and select top-K | Single |
SideEffect |
Async operations (caching, logging) | Parallel, fire-and-forget |
Key constraint: Scorers must return the same number of candidates in the same order. Dropping candidates is not allowed in scorers—use filters instead.
The ranking model (recsys_model.py) uses a transformer with candidate isolation:
- Candidates can attend to user context and history
- Candidates cannot attend to other candidates (only self-attention)
- This ensures scores are independent and cacheable
The attention mask is created by make_recsys_attn_mask() in grok.py.
Predicted actions (multi-task output):
- Positive: favorite, reply, repost, quote, click, profile_click, video_view, photo_expand, share, dwell, follow_author
- Negative: not_interested, block_author, mute_author, report
WeightedScorer in home-mixer/scorers/weighted_scorer.rs combines predictions:
Final Score = Σ (weight_i × P(action_i))
Weights are defined in the params module (excluded from open source for security).
-
Async traits: All pipeline components use
#[tonic::async_trait]or#[async_trait] -
Trait implementation pattern:
#[async_trait] impl Filter<ScoredPostsQuery, PostCandidate> for MyFilter { async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>; }
-
Statistics macro:
#[xai_stats_macro::receive_stats]decorates methods for metrics -
Arc for shared state: Services use
Arc<T>for thread-safe shared access -
Error handling: Pipeline components return
Result<T, String>(not custom error types) -
Naming conventions:
- Filters end with
Filter(e.g.,MutedKeywordFilter) - Scorers end with
Scorer(e.g.,WeightedScorer) - Hydrators end with
Hydrator(e.g.,CoreDataCandidateHydrator)
- Filters end with
-
Dataclasses for configuration:
@dataclassfor model configs -
Haiku modules: All neural network components inherit from
hk.Module -
NamedTuple for outputs: Use
NamedTuplefor structured outputs (e.g.,RecsysModelOutput) -
Type hints: All function signatures include type hints
-
JAX dtypes: Use
jnp.bfloat16for forward pass,jnp.float32for initialization -
Hash-based embeddings: Multiple hash functions per entity type (users, items, authors)
| File | Purpose |
|---|---|
candidate-pipeline/candidate_pipeline.rs |
Pipeline execution framework - read this first for Rust |
home-mixer/server.rs |
Main gRPC endpoint |
home-mixer/scorers/weighted_scorer.rs |
Final score computation |
phoenix/grok.py |
Transformer architecture |
phoenix/recsys_model.py |
Recommendation model with candidate isolation |
thunder/thunder_service.rs |
In-network post retrieval service |
cd phoenix/
uv run pytest test_recsys_model.py test_recsys_retrieval_model.py -vTests verify:
- Attention mask structure (candidate isolation)
- Model output shapes
- Edge cases (single candidate, all candidates)
No tests are visible in the open-source release. If adding tests, follow Rust conventions:
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_filter_logic() {
// ...
}
}- Create file in
home-mixer/filters/(e.g.,my_new_filter.rs) - Implement
Filter<ScoredPostsQuery, PostCandidate>trait - Add to
mod.rsin the filters directory - Register in the pipeline configuration
- Create file in
home-mixer/scorers/(e.g.,my_scorer.rs) - Implement
Scorer<ScoredPostsQuery, PostCandidate>trait - Important: Return candidates in same order, same count
- Use
update()to set only your scorer's fields
- Understand the attention mask in
make_recsys_attn_mask() - Run existing tests before changes:
uv run pytest -v - Add tests for new functionality
- Test with representative batch sizes
-
Scorer order matters: Scorers run sequentially.
PhoenixScorermust run beforeWeightedScorer. -
Candidate isolation is critical: The attention mask ensures candidates don't influence each other's scores. Breaking this would make scores batch-dependent.
-
Hash 0 is reserved: In embedding lookups, hash value 0 indicates padding/invalid. Check
(hash[:, 0] != 0)for validity masks. -
Filter backup on error: The pipeline framework restores candidates from backup if a filter fails.
-
gRPC compression: Thunder service uses Zstd compression for requests and responses.
-
Excluded modules:
clients,params, andutilin home-mixer are excluded from open source for security. You'll seepub mod clients; // Excludedcomments. -
Proto definitions external: The
xai_*_protocrates are not included. You'd need to define the protobuf schemas.
- Protobuf definitions for gRPC services
- Kafka topics and schemas for Thunder ingestion
- Strato client for fetching user following lists
- Phoenix ML model weights
- Various internal X infrastructure clients
Apache License 2.0