Functional and Business Requirements
- Goal: Improve individual user engagement (view, like, comment) on suggested posts—boosting metrics like Daily Active Users (DAU) and session numbers.
- Scope: Focus on non-friend content (from creators, not just connections). Aim is to predict and increase personalized engagement.
- ML Objective: Aligns with business needs but optimizes a correlated surrogate metric (like engagement probability) at the user level, not global DAU directly.
Non-Functional Requirements
- Scalability & Availability: Must handle hundreds of millions of daily active users globally; downtime or slow model serving is unacceptable.
- MLOps & Tooling:
- Analytics: Debugability, monitoring, and tracking.
- Alerts: Automated notifications for feature coverage drop, anomalous engagement rates, etc.
- Reliability: System should gracefully handle failures and allow rapid diagnosis.
Core ML Pipeline Stages and Architecture
-
Candidate Generation
- Purpose: Select a manageable set (e.g., 1,000 out of billions) of potentially relevant posts for a given user.
- Method: Use approximate nearest neighbor (ANN) algorithms to find items whose embeddings (vector summaries) are most similar to the user’s embedding.
- Inputs: User and post embeddings, usually generated via deep learning or matrix factorization.
-
Ranking
- Process: Score each candidate based on predicted likelihood of user engagement (view, like, comment).
- Model Choices:
- Collaborative Filtering: Matrix factorization; finds low-dimensional representations by decomposing user-item interaction matrix.
- Two-Tower Network:
- Independent neural networks (towers) for users and posts.
- Each tower produces embeddings.
- Dot product plus sigmoid gives engagement probability.
- Trained with binary cross-entropy loss on actual engagement labels (positive and negative).
- Modern practice generally prefers two-tower for scalability, flexibility, and rapid embedding calculation.
-
Post-processing
- Goal: Adjust ranked list to incorporate fairness, diversity, and content freshness.
- Operation: Rearranges (reranks) candidates; does NOT create new ones. Example: Boosting diverse creators or promoting underrepresented content types.
Feature Engineering and Data Collection
-
Types of Features:
- Viewer features: Profile data, historical engagement stats, social graph info.
- Post features: Video/audio/text embeddings (from pre-trained models), creator stats (followers, activity).
- Interaction history: Aggregated (e.g., total likes in last 7 days) and delayed features (lagged actions, e.g., likes given 2 weeks ago).
- Labels: Binary engagement signals (1 for positive, 0 for negative, where negative is no engagement after some time threshold).
-
Embeddings:
- Learned via pre-trained models for content (text, audio, video).
- Learned via model architecture for users/posts in collaborative filtering or two-tower.
Training and Evaluation
- Dataset Preparation:
- Must have enough positive and negative samples (user saw post, did or did not engage).
- Negative samples require posts shown but not interacted; balance is key (e.g., negative sampling).
- Metrics:
- AUC (Area Under ROC Curve): Measures how well the model distinguishes engaged vs non-engaged items; >50% means doing better than chance.
- A/B Testing: Deploy two versions (old vs new algorithm) to separate groups, compare lift in engagement/business metrics.
- Safeguard Metrics: Also track if models increase undesirable behaviors (increased reporting/blocking, spam, etc.).
Candidate Generation Details
- Uses embeddings from the two-tower model for efficient ANN search.
- For new users (cold start), present trending or popular items as initial candidates.
Deployment Practicalities & System Engineering
- Serving Pipeline: Real-time embedding computation for users; rapid ANN retrieval from a vector database.
- Continuous Learning: Online learning—model adapts to non-stationary user behavior and new posts/content.
- Distributed Computation: Scaling out model serving and feature computation to many servers for reliability and speed.
Challenges Discussed
- Cold Start Problem: For users/content with little history, default to popular/trending posts until personal data is available.
- Class Imbalance: Most items not engaged with; careful negative sampling and balanced training is essential.
- Trade-offs: Two-tower model higher ceiling, collaborative filtering easier for small scale or less compute.
- Offline vs Online Metrics: Use both; offline AUC/validation, online A/B test on real users.
Final Evaluation and Recommendations
- Promotion Criteria: If A/B test shows higher engagement AND safeguard metrics (spam, blocks) remain healthy, model is promoted to production.
- Engineering: Address system scale (multiple servers, quick failover), model drift, and data monitoring.
Interviewer Feedback
- Strengths: Solution thorough and systematic, covers all pipeline stages, highlights real-world challenges and metrics.
- Suggestions: Could accelerate initial design steps, spend more time on distributed engineering, serving systems, and streaming/online learning aspects.
If you want deeper technical details (vector DBs, ANN algorithms, detailed model training code, or system diagrams), just ask!