akitaonrails/AGENTS.md

Last active January 25, 2026 08:34

Star (8) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/akitaonrails/f1b25aa2bbb5a7c5dfb94ad98648001a.js"></script>
Save akitaonrails/f1b25aa2bbb5a7c5dfb94ad98648001a to your computer and use it in GitHub Desktop.

Download ZIP

Analysis of 2025 X Algorithm

Raw

AGENTS.md

AGENTS.md

This document helps AI agents work effectively in the X For You Feed Algorithm repository.

Project Overview

This repository contains the core recommendation system powering the "For You" feed on X (Twitter). It combines:

In-network content from accounts you follow (Thunder)
Out-of-network content discovered through ML-based retrieval (Phoenix)

Everything is ranked using a Grok-based transformer model that predicts engagement probabilities.

Repository Structure

x-algorithm/
├── home-mixer/          # Rust - Orchestration layer (main service)
│   ├── candidate_pipeline/  # Pipeline implementation for the home feed
│   ├── filters/             # Pre/post-selection filters
│   ├── scorers/             # ML scorers and weighted combination
│   ├── candidate_hydrators/ # Enrich candidates with additional data
│   ├── query_hydrators/     # Fetch user context
│   ├── sources/             # Thunder and Phoenix candidate sources
│   ├── selectors/           # Top-K selection logic
│   ├── side_effects/        # Async operations (caching, logging)
│   └── server.rs            # gRPC server (ScoredPostsService)
├── thunder/             # Rust - In-memory post store + Kafka ingestion
│   ├── posts/               # Post storage implementation
│   ├── kafka/               # Real-time tweet event listeners
│   └── thunder_service.rs   # gRPC server (InNetworkPostsService)
├── candidate-pipeline/  # Rust - Generic pipeline framework (traits)
│   ├── filter.rs            # Filter trait
│   ├── hydrator.rs          # Hydrator trait
│   ├── scorer.rs            # Scorer trait
│   ├── selector.rs          # Selector trait
│   ├── source.rs            # Source trait
│   └── candidate_pipeline.rs # Pipeline execution framework
└── phoenix/             # Python - ML ranking and retrieval models
    ├── grok.py              # Transformer architecture (ported from Grok-1)
    ├── recsys_model.py      # Ranking model with candidate isolation
    ├── recsys_retrieval_model.py  # Two-tower retrieval model
    ├── run_ranker.py        # Run ranking inference
    ├── run_retrieval.py     # Run retrieval inference
    └── test_*.py            # Unit tests

Languages and Build Systems

Python (phoenix/)

Requirements: Python >= 3.11, uv package manager

# Run ranking model
cd phoenix/
uv run run_ranker.py

# Run retrieval model
uv run run_retrieval.py

# Run tests
uv run pytest test_recsys_model.py test_recsys_retrieval_model.py

Dependencies (from pyproject.toml):

dm-haiku>=0.0.13 - Neural network library for JAX
jax==0.8.1 - NumPy on accelerators
numpy>=1.26.4
pytest (dev dependency)

Linting: Ruff is configured but not enforced. Settings in pyproject.toml:

Line length: 100
Indent width: 4

Rust (home-mixer/, thunder/, candidate-pipeline/)

Note: This repository contains Rust source files but is missing Cargo.toml files. The code cannot be built without adding proper Cargo configuration. The crate names referenced in imports suggest:

xai_candidate_pipeline → candidate-pipeline/
xai_home_mixer_proto → external protobuf definitions
xai_thunder_proto → external protobuf definitions

If adding Cargo.toml files, use a workspace structure with the three crates.

Architecture Concepts

Pipeline Framework (candidate-pipeline/)

The CandidatePipeline trait defines a staged execution model:

Query → QueryHydrators → Sources → Hydrators → Filters → Scorers → Selector → PostSelectionFilters → SideEffects → Result

Component	Purpose	Execution
`QueryHydrator`	Fetch user context (engagement history, following list)	Parallel
`Source`	Retrieve candidates from data stores	Parallel
`Hydrator`	Enrich candidates with additional data	Parallel
`Filter`	Remove ineligible candidates	Sequential
`Scorer`	Compute scores (must preserve candidate count/order)	Sequential
`Selector`	Sort and select top-K	Single
`SideEffect`	Async operations (caching, logging)	Parallel, fire-and-forget

Key constraint: Scorers must return the same number of candidates in the same order. Dropping candidates is not allowed in scorers—use filters instead.

Phoenix ML Model

The ranking model (recsys_model.py) uses a transformer with candidate isolation:

Candidates can attend to user context and history
Candidates cannot attend to other candidates (only self-attention)
This ensures scores are independent and cacheable

The attention mask is created by make_recsys_attn_mask() in grok.py.

Predicted actions (multi-task output):

Positive: favorite, reply, repost, quote, click, profile_click, video_view, photo_expand, share, dwell, follow_author
Negative: not_interested, block_author, mute_author, report

Weighted Scoring

WeightedScorer in home-mixer/scorers/weighted_scorer.rs combines predictions:

Final Score = Σ (weight_i × P(action_i))

Weights are defined in the params module (excluded from open source for security).

Code Patterns and Conventions

Rust Patterns

Async traits: All pipeline components use #[tonic::async_trait] or #[async_trait]

Trait implementation pattern:

#[async_trait]
impl Filter<ScoredPostsQuery, PostCandidate> for MyFilter {
    async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>;
}

Statistics macro: #[xai_stats_macro::receive_stats] decorates methods for metrics
Arc for shared state: Services use Arc<T> for thread-safe shared access
Error handling: Pipeline components return Result<T, String> (not custom error types)
Naming conventions:
- Filters end with Filter (e.g., MutedKeywordFilter)
- Scorers end with Scorer (e.g., WeightedScorer)
- Hydrators end with Hydrator (e.g., CoreDataCandidateHydrator)

Python Patterns

Dataclasses for configuration: @dataclass for model configs
Haiku modules: All neural network components inherit from hk.Module
NamedTuple for outputs: Use NamedTuple for structured outputs (e.g., RecsysModelOutput)
Type hints: All function signatures include type hints
JAX dtypes: Use jnp.bfloat16 for forward pass, jnp.float32 for initialization
Hash-based embeddings: Multiple hash functions per entity type (users, items, authors)

Key Files to Understand

File	Purpose
`candidate-pipeline/candidate_pipeline.rs`	Pipeline execution framework - read this first for Rust
`home-mixer/server.rs`	Main gRPC endpoint
`home-mixer/scorers/weighted_scorer.rs`	Final score computation
`phoenix/grok.py`	Transformer architecture
`phoenix/recsys_model.py`	Recommendation model with candidate isolation
`thunder/thunder_service.rs`	In-network post retrieval service

Testing

Phoenix Tests

cd phoenix/
uv run pytest test_recsys_model.py test_recsys_retrieval_model.py -v

Tests verify:

Attention mask structure (candidate isolation)
Model output shapes
Edge cases (single candidate, all candidates)

Rust Tests

No tests are visible in the open-source release. If adding tests, follow Rust conventions:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_filter_logic() {
        // ...
    }
}

Common Tasks

Adding a New Filter

Create file in home-mixer/filters/ (e.g., my_new_filter.rs)
Implement Filter<ScoredPostsQuery, PostCandidate> trait
Add to mod.rs in the filters directory
Register in the pipeline configuration

Adding a New Scorer

Create file in home-mixer/scorers/ (e.g., my_scorer.rs)
Implement Scorer<ScoredPostsQuery, PostCandidate> trait
Important: Return candidates in same order, same count
Use update() to set only your scorer's fields

Modifying the ML Model

Understand the attention mask in make_recsys_attn_mask()
Run existing tests before changes: uv run pytest -v
Add tests for new functionality
Test with representative batch sizes

Gotchas and Non-Obvious Patterns

Scorer order matters: Scorers run sequentially. PhoenixScorer must run before WeightedScorer.
Candidate isolation is critical: The attention mask ensures candidates don't influence each other's scores. Breaking this would make scores batch-dependent.
Hash 0 is reserved: In embedding lookups, hash value 0 indicates padding/invalid. Check (hash[:, 0] != 0) for validity masks.
Filter backup on error: The pipeline framework restores candidates from backup if a filter fails.
gRPC compression: Thunder service uses Zstd compression for requests and responses.
Excluded modules: clients, params, and util in home-mixer are excluded from open source for security. You'll see pub mod clients; // Excluded comments.
Proto definitions external: The xai_*_proto crates are not included. You'd need to define the protobuf schemas.

External Dependencies (Not Included)

Protobuf definitions for gRPC services
Kafka topics and schemas for Thunder ingestion
Strato client for fetching user following lists
Phoenix ML model weights
Various internal X infrastructure clients

License

Apache License 2.0

Raw

GUIDE.md

X For You Feed: User Guide

A practical guide based on analysis of the open-source recommendation algorithm.

How the Algorithm Works

The For You feed uses a Grok-based transformer model that predicts engagement probabilities for each post. It doesn't analyze content directly—it learns from behavioral patterns: what you've liked, replied to, shared, and even how long you've looked at posts.

The Scoring Formula

Every post you see receives a score calculated as:

Final Score = Σ (weight × P(action))

The model predicts the probability you'll take each action, multiplies by a weight, and sums them up.

Engagement Signals

Signal	Type	Weight Direction
Like (favorite)	Positive	↑ Boost
Reply	Positive	↑ Boost
Repost	Positive	↑ Boost
Quote tweet	Positive	↑ Boost
Click on post	Positive	↑ Boost
Profile click	Positive	↑ Boost
Video watch time	Positive	↑ Boost
Photo expand	Positive	↑ Boost
Share via DM	Positive	↑ Boost
Share via copy link	Positive	↑ Boost
Dwell time	Positive	↑ Boost
Follow author	Positive	↑ Boost
"Not interested"	Negative	↓ Penalize
Block author	Negative	↓ Penalize
Mute author	Negative	↓ Penalize
Report	Negative	↓ Penalize

For Users: Getting a Better Feed

1. Your History Is Your Algorithm

The model uses your engagement history as context. Every like, reply, repost, and even passive reading behavior shapes what you see next. There are no shortcuts—authentic engagement over time builds a personalized model.

2. Negative Signals Are Powerful

Tapping "Not interested," muting accounts, and blocking users aren't just hiding content—they actively reduce the score of similar content in future. The algorithm applies negative weights to these signals.

If you want less of something, tell the algorithm explicitly.

3. Dwell Time Matters

You don't have to click or like a post for it to influence your feed. The algorithm tracks dwell time—how long you pause on a post while scrolling. Reading a post thoroughly (even without engaging) trains the model that you find that type of content interesting.

4. Following Is Just One Input

The feed combines two sources:

In-Network (Thunder): Posts from accounts you follow
Out-of-Network (Phoenix): Posts discovered via ML retrieval from the global corpus

Even if you don't follow someone, their content can appear if the model predicts you'll engage with it based on your history and similar users' behavior.

5. Filters You Control

The algorithm filters out:

Posts containing your muted keywords
Posts from blocked/muted accounts
Posts you've already seen
Your own posts
Old posts beyond a freshness threshold

Use muted keywords liberally for topics you want to avoid entirely.

For Creators: Maximizing Reach

1. Optimize for High-Value Engagements

Not all engagements are equal. Based on the signal weights:

Priority	Action	Why
High	Replies	Shows deep engagement, likely weighted heavily
High	Quote tweets	Adds context, drives discovery
High	Shares (DM/link)	Strong signal of value
Medium	Reposts	Good signal but less effort than reply/quote
Medium	Likes	Common action, probably lower weight per instance

Takeaway: Create content that sparks conversation (replies) or is worth sharing privately (DM shares).

2. Video Has a Minimum Duration Threshold

The code shows a MIN_VIDEO_DURATION_MS check:

fn vqv_weight_eligibility(candidate: &PostCandidate) -> f64 {
    if candidate.video_duration_ms.is_some_and(|ms| ms > p::MIN_VIDEO_DURATION_MS) {
        p::VQV_WEIGHT
    } else {
        0.0
    }
}

Videos shorter than this threshold don't receive the video-watch-time score boost. If you're posting video content, ensure it meets the minimum duration to benefit from VQV (video quality view) scoring.

3. Avoid Triggering Negative Signals

Content that causes users to:

Click "Not interested"
Block you
Mute you
Report the post

...receives negative weight in the scoring formula. A single report or block doesn't just affect that user—it signals to the algorithm that the content may be low quality for similar users.

4. Posting Frequency Has Diminishing Returns

The algorithm includes an Author Diversity Scorer that attenuates scores when the same author appears multiple times in a feed batch:

pub struct AuthorDiversityScorer;

This means flooding the timeline with posts won't linearly increase your visibility. Quality over quantity—each additional post from you in a single feed refresh gets progressively discounted.

5. Engagement Begets Engagement

The model scores candidates based on the viewer's engagement history. If your followers have a history of engaging with your content, your new posts will score higher for them.

This creates a feedback loop:

Post engaging content
Followers interact
Algorithm learns to show your content to those followers
Higher baseline visibility for future posts

6. Share-Worthy Content Gets Explicit Boost

The algorithm tracks multiple share types separately:

share_score
share_via_dm_score
share_via_copy_link_score

Creating content people want to share privately (via DM) or externally (copy link) gives you multiple scoring boosts.

Technical Details

Candidate Isolation

Each post is scored independently. The transformer uses a special attention mask that prevents candidates from attending to each other—only to user context. This means:

Your post's score doesn't depend on what other posts are in the batch
Scores are consistent and cacheable
No "competition" effects within a single feed refresh

Two-Stage Pipeline

Retrieval: Narrows millions of candidates to thousands using embedding similarity
Ranking: Transformer model scores and ranks the retrieved candidates

For out-of-network content, you first need to pass retrieval (embedding similarity to the user's interests) before ranking can boost you.

What the Algorithm Doesn't See

Based on the code, the scoring model takes:

User ID hashes
Post ID hashes
Author ID hashes
Engagement history (actions taken)
Product surface (where you're viewing from)

It does not appear to directly analyze:

Post text content
Image content
Hashtags
Post length

The model learns associations purely through behavioral patterns—what types of posts (by their hash embeddings) get engagement from what types of users.

Summary

If You Want...	Do This
Better feed quality	Engage authentically, use "Not interested" liberally
More reach as creator	Optimize for replies, quotes, shares
Less of certain content	Mute keywords and accounts
Video visibility	Ensure videos meet minimum duration
Consistent visibility	Build engagement habits with your audience

The core principle: the algorithm predicts what you'll engage with based on your past behavior. There's no gaming it—only training it through genuine interaction patterns.

Raw

QUALITY.md

Code Quality Analysis

An honest assessment of the X For You Feed Algorithm codebase.

Executive Summary

This is production-quality internal code that was open-sourced with significant pieces removed. The architecture and patterns are sound—clearly written by experienced engineers. However, it functions more as a reference implementation than a runnable project.

Aspect	Rating	Notes
Readability	★★★★☆	Clear naming, good structure
Documentation	★★★☆☆	Python good, Rust sparse
Testability	★★☆☆☆	No DI, hard dependencies
Buildability	★☆☆☆☆	Missing Cargo.toml, protos
Extensibility	★★★★☆	Trait design is solid
Error Handling	★★☆☆☆	String errors, no recovery strategies

Strengths

Architecture

Clean separation of concerns - Each component has a single responsibility:

Component	Responsibility
Thunder	In-memory post storage + real-time ingestion
Phoenix	ML ranking and retrieval models
Home-Mixer	Orchestration and serving
candidate-pipeline	Reusable pipeline framework

Extensible pipeline pattern - The trait-based design makes adding new components straightforward:

// Adding a new filter is just implementing a trait
#[async_trait]
impl Filter<ScoredPostsQuery, PostCandidate> for MyNewFilter {
    async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String> {
        // Implementation
    }
}

Parallel execution - Sources and hydrators run concurrently where dependencies allow:

let hydrate_futures = hydrators.iter().map(|h| h.hydrate(query, &candidates));
let results = join_all(hydrate_futures).await;

Rust Code Quality

Strong trait abstractions - The core traits (Filter, Scorer, Hydrator, Source, Selector) are well-designed:

Sensible default implementations
Clear contracts (scorers must preserve order, filters partition candidates)
Optional enable() method for conditional execution

Consistent async patterns - Proper use of #[async_trait] throughout, no mixing of sync/async inappropriately.

Good module organization - Each filter, scorer, and hydrator lives in its own file:

home-mixer/
├── filters/
│   ├── mod.rs
│   ├── age_filter.rs
│   ├── muted_keyword_filter.rs
│   └── ...
├── scorers/
│   ├── mod.rs
│   ├── weighted_scorer.rs
│   └── ...

Graceful degradation - The pipeline continues on component failure:

match filter.filter(query, candidates).await {
    Ok(result) => {
        candidates = result.kept;
    }
    Err(err) => {
        error!("request_id={} component={} failed: {}", ...);
        candidates = backup;  // Restore from backup on failure
    }
}

Python Code Quality

Excellent documentation - Docstrings explain tensor shapes, arguments, and return values:

def block_user_reduce(
    user_hashes: jnp.ndarray,
    user_embeddings: jnp.ndarray,
    num_user_hashes: int,
    emb_size: int,
    embed_init_scale: float = 1.0,
) -> Tuple[jax.Array, jax.Array]:
    """Combine multiple user hash embeddings into a single user representation.

    Args:
        user_hashes: [B, num_user_hashes] - hash values (0 = invalid/padding)
        user_embeddings: [B, num_user_hashes, D] - looked-up embeddings
        ...

    Returns:
        user_embedding: [B, 1, D] - combined user embedding
        user_padding_mask: [B, 1] - True where user is valid
    """

Type hints everywhere - Makes the code self-documenting:

def __call__(
    self,
    batch: RecsysBatch,
    recsys_embeddings: RecsysEmbeddings,
) -> RecsysModelOutput:

Clean model hierarchy - Configuration and implementation cleanly separated:

TransformerConfig → creates Transformer
PhoenixModelConfig → creates PhoenixModel

Comprehensive tests - The attention mask tests cover edge cases:

Single candidate
All candidates
Full mask structure verification
Dtype preservation

Weaknesses

Critical Issues

1. Missing Cargo.toml Files

The Rust code cannot be built:

home-mixer/          ❌ No Cargo.toml
thunder/             ❌ No Cargo.toml  
candidate-pipeline/  ❌ No Cargo.toml

A workspace configuration would be needed:

# Cargo.toml (workspace root)
[workspace]
members = [
    "home-mixer",
    "thunder", 
    "candidate-pipeline",
]

2. Missing External Dependencies

Referenced crates are not available:

Crate	Used In	Purpose
`xai_candidate_pipeline`	home-mixer	Pipeline traits
`xai_home_mixer_proto`	home-mixer	gRPC definitions
`xai_thunder_proto`	thunder	gRPC definitions
`xai_stats_macro`	both	Metrics instrumentation
`xai_post_text`	home-mixer	Text tokenization

3. Excluded Modules

Several modules are stubbed out:

pub mod clients; // Excluded from open source release for security reasons
pub mod params;  // Excluded from open source release for security reasons
pub mod util;    // Excluded from open source release for security reasons

This means:

Scoring weights are unknown
Client implementations are missing
Utility functions are unavailable

Code Quality Issues

1. Simplistic Error Handling

Everything returns Result<T, String>:

async fn score(&self, query: &Q, candidates: &[C]) -> Result<Vec<C>, String>;
async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>;

Problems:

No error type hierarchy
Can't match on specific error variants
Loses error context in chains

Should be:

#[derive(Debug, thiserror::Error)]
enum PipelineError {
    #[error("Hydration failed: {0}")]
    HydrationError(String),
    #[error("Scoring failed: {0}")]
    ScoringError(String),
    // ...
}

2. Magic Strings for Metrics

Metric labels are string literals:

GET_IN_NETWORK_POSTS_FOUND_FRESHNESS_SECONDS
    .with_label_values(&["stage"])
    .observe(freshness as f64);

Should use constants or enums to prevent typos.

3. Clone-Heavy Patterns

Excessive cloning where references might suffice:

let muted_keywords = query.user_features.muted_keywords.clone();
// Later used only for iteration

let backup = candidates.clone();
// Could potentially use Cow or checkpoint differently

4. No Dependency Injection

Services construct their own dependencies:

impl HomeMixerServer {
    pub async fn new() -> Self {
        HomeMixerServer {
            phx_candidate_pipeline: Arc::new(PhoenixCandidatePipeline::prod().await),
        }
    }
}

Problems:

Hard to test in isolation
Can't swap implementations
No mock support

Should be:

impl HomeMixerServer {
    pub fn new(pipeline: Arc<dyn CandidatePipeline<...>>) -> Self {
        HomeMixerServer { phx_candidate_pipeline: pipeline }
    }
}

5. Missing Rust Tests

No visible Rust tests in the open-source release. The Python code has tests, but Rust components have none:

// Expected but missing:
#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_muted_keyword_filter() {
        // ...
    }
}

Architectural Concerns

1. Tight Coupling to Internal Infrastructure

The code depends on:

Kafka topics and schemas
Strato client for social graph
Internal gRPC services
Proprietary metrics system

This makes standalone execution impossible.

2. Configuration Hardcoded

Weights and thresholds live in the excluded params module:

use crate::params as p;

// Referenced but values unknown:
p::FAVORITE_WEIGHT
p::REPLY_WEIGHT
p::MIN_VIDEO_DURATION_MS
p::NEGATIVE_SCORES_OFFSET

Should be externalized to config files or environment variables.

3. Inconsistent Visibility

Mix of pub and private items without clear rationale:

pub mod scorers;      // Public
mod candidate_pipeline;  // Private
pub use server::HomeMixerServer;  // Re-exported

Recommendations

To Make This Runnable

Add Cargo.toml workspace configuration

[workspace]
members = ["home-mixer", "thunder", "candidate-pipeline"]

[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
tonic = "0.11"
# ...

Define protobuf schemas
- Create .proto files for ScoredPostsService and InNetworkPostsService
- Generate Rust code with tonic-build

Stub out excluded modules

// params.rs
pub const FAVORITE_WEIGHT: f64 = 1.0;
pub const REPLY_WEIGHT: f64 = 1.5;
// ... reasonable defaults

Mock external clients

pub struct MockStratoClient;

impl StratoClient for MockStratoClient {
    async fn fetch_following_list(&self, user_id: i64, limit: i32) -> Result<Vec<i64>, Error> {
        Ok(vec![])  // Return empty or test data
    }
}

Add integration tests
- Test the full pipeline with mock data
- Verify filter ordering
- Check scorer composition

To Improve Code Quality

Introduce typed errors

use thiserror::Error;

#[derive(Error, Debug)]
pub enum PipelineError {
    #[error("source {name} failed: {cause}")]
    SourceError { name: &'static str, cause: String },
    // ...
}

Add dependency injection
- Accept traits instead of concrete types
- Use constructor injection
- Enable mock implementations for testing
Externalize configuration
- Move weights to YAML/TOML config
- Support environment variable overrides
- Add validation on startup
Add comprehensive tests
- Unit tests for each filter/scorer
- Integration tests for pipeline stages
- Property-based tests for scoring logic

Verdict

Good For

✅ Understanding X's recommendation architecture
✅ Learning pipeline design patterns
✅ Seeing how large-scale ML systems are structured
✅ Reference for building similar systems

Not Good For

❌ Running locally without significant work
❌ Forking and modifying directly
❌ Learning idiomatic open-source Rust practices
❌ Production deployment as-is

Bottom Line

This codebase demonstrates solid engineering practices for a complex recommendation system. The architecture is clean, the abstractions are well-designed, and the code is readable. However, it's an incomplete snapshot of a larger system—useful for learning and reference, but not for direct use.

The value is in understanding how X builds their recommendation system, not in running the code itself.