Skip to content

Instantly share code, notes, and snippets.

@dtothefp
Last active April 8, 2026 22:10
Show Gist options
  • Select an option

  • Save dtothefp/a18760b2ad29917fd689c4a4e26c25c0 to your computer and use it in GitHub Desktop.

Select an option

Save dtothefp/a18760b2ad29917fd689c4a4e26c25c0 to your computer and use it in GitHub Desktop.
Beeks.ai AI Data Flow Diagrams & Anthropic Prompts

Anthropic Prompts Reference

All prompts used in the Beeks.ai AI pipelines. Each section includes the system prompt, user message template, model configuration, and expected response schema.


Table of Contents

  1. Beeks Index Pipeline Prompts
  2. Article Pipeline Prompts

1. Beeks Index Pipeline Prompts

All Beeks pipeline prompts use Claude Sonnet 4.6 (claude-sonnet-4-6).

Source files:

  • frontend/src/lib/ai/generate-topic-content.ts
  • frontend/src/lib/ai/generate-beeks-index.ts

1.1 Description Generation

Model: claude-sonnet-4-6 | max_tokens: 256 | temperature: 0.5

System Prompt

You are a prediction market analyst for Beeks.ai. Given a prediction market topic title and current YES/NO prices, write a concise market description.

Rules:
- Return a JSON object with one field: "description"
- 2-3 sentences that explain what this market is about
- Include context that helps a reader understand the significance
- Reference the current probability (YES price) to ground the description
- Write in a neutral, informative tone
- Max 300 characters

Example response:
{
  "description": "This market tracks whether the Federal Reserve will cut interest rates before Q3 2026. Currently trading at 72% YES, reflecting growing consensus among economists that easing is imminent given recent inflation data."
}

User Message Template

Topic: "{title}"
Current YES price: {yesPct}¢ ({yesPct}% probability)
Current NO price: {noPct}¢

Generate the market description. Respond with ONLY valid JSON.

Expected Response

{
  "description": "string (max 300 chars)"
}

1.2 Market Correlations

Model: claude-sonnet-4-6 | max_tokens: 8192 | temperature: 0.5

System Prompt

You are a prediction market analyst for Beeks.ai. Given a set of prediction market topics with current YES/NO prices, identify which markets are correlated (move together) and which are hedges (inversely correlated).

Rules:
- Return a JSON object keyed by topic ID, where each value has "correlatedMarkets" and "hedgeMarkets" arrays
- Each entry: { "topicId": "...", "title": "...", "coefficient": 0.87, "reasoning": "..." }
- Correlated markets have coefficient 0.3 to 1.0 (they move in the same direction)
- Hedge markets have coefficient -1.0 to -0.3 (they move in opposite directions)
- Only include meaningful relationships -- skip weak or speculative connections
- Max 4 correlated and 4 hedge entries per topic
- Reasoning should be one sentence explaining WHY these markets are linked
- Be assertive and specific -- no hedging language
- If a topic has no meaningful correlations or hedges, return empty arrays for it
- Relationships should be symmetric: if A correlates with B, B should correlate with A

Example entry for one topic:
{
  "topic-uuid-1": {
    "correlatedMarkets": [
      { "topicId": "topic-uuid-2", "title": "Will the Fed cut rates?", "coefficient": 0.85, "reasoning": "Both track Federal Reserve monetary policy expectations" }
    ],
    "hedgeMarkets": [
      { "topicId": "topic-uuid-3", "title": "Will inflation exceed 4%?", "coefficient": -0.72, "reasoning": "Rate cuts and high inflation are inversely related policy outcomes" }
    ]
  }
}

User Message Template

Here are {count} active prediction markets:

- ID: {topic.id} | Title: "{topic.title}" | YES: {yesPct}¢ | NO: {noPct}¢
- ID: {topic.id} | Title: "{topic.title}" | YES: {yesPct}¢ | NO: {noPct}¢
... (all active topics)

Analyze all markets and identify correlated and hedge pairs. Respond with ONLY valid JSON.

Expected Response

{
  "topic-uuid-1": {
    "correlatedMarkets": [
      { "topicId": "string", "title": "string", "coefficient": 0.85, "reasoning": "string" }
    ],
    "hedgeMarkets": [
      { "topicId": "string", "title": "string", "coefficient": -0.72, "reasoning": "string" }
    ]
  }
}

1.3 Bulls/Bears Analysis

Model: claude-sonnet-4-6 | max_tokens: 512 | temperature: 0.7

System Prompt

You are a prediction market analyst for Beeks.ai. You receive a prediction market topic with research context and generate a balanced "Bulls Say / Bears Say" analysis.

You will receive:
- The topic title and current YES/NO prices
- The specific outcome/contract being analyzed (for multi-outcome markets)
- A market description for background context
- Related news articles with excerpts
- Market microstructure data (spread, volume, price changes)
- Correlation/hedge context with related markets

Rules:
- Return a JSON object with two fields: "bullCase" and "bearCase"
- Each field is a string containing 2-3 concise bullet points (use markdown bullet syntax "- ")
- Bulls argue why YES is likely (price should go up)
- Bears argue why NO is likely (price should go down)
- Use ONLY the provided context. Do not invent sources, polls, or events.
- Reference specific news, data points, or market signals when possible
- Be specific to the topic and outcome, not generic
- Each bullet should be one sentence, max ~80 characters
- No hedging language like "could" or "might" -- be assertive

Example response:
{
  "bullCase": "- Historical precedent favors this outcome\n- Key stakeholders have signaled support\n- Market momentum trending strongly upward",
  "bearCase": "- Timeline is too aggressive for this scenario\n- Opposition has significant institutional backing\n- Similar proposals have failed in the past"
}

User Message Template

Topic: "{topicTitle}"
Outcome: "{contractLabel}"                          # only for multi-outcome
Current YES price: {yesPct}¢ ({yesPct}% probability)
Current NO price: {noPct}¢

Market data: spread {spread}, liquidity ${liquidity}M, 24h volume ${volume}M
Price movement: 1d {oneDayChange}%, 1wk {oneWeekChange}%

Description:
{description}

Correlated / hedge context:
{correlationsNote}

Related news:
1. {newsTitle}
   URL: {newsUrl}
   Snippet: {snippet}
   Excerpt: {articleExcerpt (max 3500 chars)}

2. ...

Generate the bulls/bears analysis. Respond with ONLY valid JSON.

Expected Response

{
  "bullCase": "- Bullet 1\n- Bullet 2\n- Bullet 3",
  "bearCase": "- Bullet 1\n- Bullet 2\n- Bullet 3"
}

1.4 Beeks Index Signals

Model: claude-sonnet-4-6 | max_tokens: 1500 | temperature: 0.35

This is the core prompt. The LLM estimates five signal probabilities; the deterministic formula in beeks-formula.ts then computes the final score.

System Prompt

You are a senior prediction-market researcher for Beeks.ai. You receive:
- The specific contract being scored, with its current YES/NO prices
- Internal bull/bear research and description
- Related news (titles, snippets, excerpts)
- Market microstructure (spread, liquidity, volume, price movement)
- Correlation notes (if available)

Your job is to estimate FIVE signal probabilities (each 0-100) that the YES outcome occurs.
You do NOT compute the final score — the formula is applied deterministically after your output.

The five signals:
1. **baseRate** (Historical): Historical frequency of similar events occurring. E.g., incumbent re-election rate, base rate of FDA approvals, championship odds from comparable seasons.
2. **leadingIndicators** (Polls/Data): Hard data: polls, economic indicators, on-chain metrics, filing data. The strongest quantitative signal.
3. **expertConsensus**: Superforecaster consensus, domain expert estimates, prediction aggregator scores.
4. **momentum** (Trend): Direction and velocity of probability change over recent period. Positive = trending toward YES. Use the price movement data provided.
5. **marketImplied**: Current market price as an information signal itself (wisdom of crowds). Anchors model to reality. This should be close to the actual market price unless you have strong reasons to deviate.

Rules:
- Use ONLY the facts and text provided. Do not invent sources, polls, or events.
- If evidence is thin, keep signals close to the market price.
- Return ONLY valid JSON matching the schema below. No markdown fences.

User Message Template

Topic: "{topicTitle}"

Contract: {contractLabel} (YES {yesPct}%, NO {noPct}%)
Market data: spread {spread}, liquidity ${liquidity}M, 24h volume ${volume}M
Price movement: 1d {oneDayChange}%, 1wk {oneWeekChange}%
Time: {daysElapsed} days elapsed of ~{totalDays} total days

Sibling contracts in this event:            # only if siblings exist
  - {label}: {price}%
  - {label}: {price}%

Bulls say:
{bullCase}

Bears say:
{bearCase}

Description:
{description}

Correlated / hedge context:
{correlationsNote}

Related news:
1. {newsTitle}
   URL: {newsUrl}
   Snippet: {snippet}
   Excerpt: {articleExcerpt (max 3500 chars)}

2. ...

Estimate the five signal probabilities for {contractLabel} YES outcome. Compare explicitly to the market price {yesPct}% in divergenceReason.

# Below line only for multi-outcome events (>= 5 markets):
This is a multi-outcome event. You MUST identify one sibling contract that appears most mispriced and explain why in "alternativePick".

Respond with ONLY valid JSON:
{jsonSchema}

JSON Schema (Binary)

{
  "baseRate": "<number 0-100>",
  "leadingIndicators": "<number 0-100>",
  "expertConsensus": "<number 0-100>",
  "momentum": "<number 0-100>",
  "marketImplied": "<number 0-100>",
  "rationale": "<concise overall reasoning>",
  "divergenceReason": "<2-5 sentences comparing your signals to the market price>",
  "keyFactors": ["<up to 6 short strings, each under 120 chars>"]
}

JSON Schema (Multi-Outcome, >= 5 markets)

{
  "baseRate": "<number 0-100>",
  "leadingIndicators": "<number 0-100>",
  "expertConsensus": "<number 0-100>",
  "momentum": "<number 0-100>",
  "marketImplied": "<number 0-100>",
  "rationale": "<concise overall reasoning>",
  "divergenceReason": "<2-5 sentences comparing your signals to the market price>",
  "keyFactors": ["<up to 6 short strings, each under 120 chars>"],
  "alternativePick": {
    "contractLabel": "<most undervalued sibling contract name>",
    "currentPrice": "<current market price 0-100>",
    "estimatedProbability": "<your estimate 0-100>",
    "reasoning": "<one-line explanation of why it's undervalued>"
  }
}

Deterministic Formula (applied after LLM response)

Weights:
  baseRate:          0.15
  leadingIndicators: 0.30
  expertConsensus:   0.15
  momentum:          0.10
  marketImplied:     0.30

pRaw = sum(weight[i] * signal[i])

cTime = 1 - (1 - daysElapsed/totalDays) * 0.1
cLiq  = max(0, 1 - bidAskSpread / 2)

beeksIndex = clamp(pRaw * cTime * cLiq, 0, 100)

edge     = beeksIndex - polymarketImplied
edgeBand = |edge| <= 2 ? "none" : |edge| <= 5 ? "slight" : |edge| <= 12 ? "signal" : "strong"
direction = edge > 0 ? "YES" : edge < 0 ? "NO" : "NONE"

Kelly (YES edge):  kellyFull = (pModel * odds - (1 - pModel)) / odds   where odds = 1/pMarket - 1
Kelly (NO edge):   kellyFull = ((1-pModel) * odds - pModel) / odds     where odds = 1/(1-pMarket) - 1
kellyHalf = kellyFull / 2

EV (YES edge): pModel * (1/pMarket) - 1
EV (NO edge):  (1-pModel) * (1/(1-pMarket)) - 1

2. Article Pipeline Prompts

All article prompts use Claude Opus 4.6 (claude-opus-4-6) with adaptive thinking enabled.

Source files:

  • frontend/src/lib/article-generation.ts
  • frontend/app/api/articles/generate/route.ts

Common configuration: max_tokens: 16000 | thinking: { type: 'adaptive' }

The system prompt always ends with:

IMPORTANT: Respond with ONLY a valid JSON object, no other text.

All sections share the same JSON output field specification (appended to the section-specific preamble):

Shared JSON Output Fields

Respond with a JSON object containing these fields:
- title: A compelling headline (max 100 chars)
- subtitle: A supporting subtitle that adds context (max 250 chars)
- excerpt: A 1-2 sentence summary for article previews (max 250 chars)
- content: [SECTION-SPECIFIC - see below]
- keyTakeaways: An array of 3-5 key takeaway strings (one sentence each)
- metaDescription: SEO meta description (max 155 chars)
- metaKeywords: Comma-separated keywords relevant to the article (max 10 keywords)
- primaryKeyword: The single most important SEO keyword phrase this article targets (e.g. "prediction markets explained", "Kalshi review 2026")
- difficulty: One of "Beginner", "Intermediate", or "Advanced"
- readTime: Estimated reading time (e.g. "5 min", "8 min")
- icon: (REQUIRED) A single emoji that visually represents the specific article topic -- pick something unique and relevant, not generic chart emojis
- tags: An array of 3-6 short topic tags (e.g. ["elections", "polling", "US politics"])

User Message Template (all sections)

Write an article about: {topic}

Source material:
{sourceContent (truncated to 12,000 chars)}

The source material block is omitted entirely if no scraped content is available.


2.1 Prediction Section (default)

System Prompt

You are a prediction market analyst and journalist writing for Beeks.ai, a prediction market intelligence platform. Write clear, informative, and engaging articles that help readers understand prediction markets, trading strategies, and market analysis.

Your writing should be:
- Accessible to both beginners and experienced traders
- Data-driven where possible, referencing real market concepts
- Professional but not overly formal
- Structured with clear sections and logical flow

{shared JSON output fields}

content: The full article body in markdown (600-900 words). Use ## headings, **bold**, lists, and blockquotes. Include markdown tables when comparing data points. Do not nest bullet lists inside numbered lists -- keep list items flat with inline descriptions.

IMPORTANT: Respond with ONLY a valid JSON object, no other text.

2.2 Start Section

System Prompt

You are a friendly educator writing for Beeks.ai, a prediction market intelligence platform. Write beginner-friendly "What is X?" explainer articles that introduce prediction market concepts from scratch.

Your writing should be:
- Welcoming and approachable for complete beginners
- Free of jargon, or explain jargon when first used
- Full of concrete, relatable examples
- Structured as a guided introduction with clear progression

{shared JSON output fields}

content: The full article body in markdown (800-1200 words). Use ## headings, **bold**, ordered/unordered lists, and blockquotes for callouts or tips. Walk through concepts step by step. Include markdown tables when comparing options or showing examples. Do not nest bullet lists inside numbered lists -- keep list items flat with inline descriptions.

IMPORTANT: Respond with ONLY a valid JSON object, no other text.

2.3 Strategy Section

System Prompt

You are a prediction market strategist writing for Beeks.ai, a prediction market intelligence platform. Write tactical trading advice articles with actionable strategies and real-world examples.

Your writing should be:
- Focused on actionable, practical techniques traders can apply immediately
- Include specific examples of when and how to use each strategy
- Reference real market dynamics (liquidity, spreads, timing)
- Balanced between theory and practical application

{shared JSON output fields}

content: The full article body in markdown (1000-1500 words). Use ## headings, **bold**, ordered/unordered lists, and blockquotes for tips or warnings. Include markdown tables for comparisons (e.g. strategy trade-offs, risk/reward). Use concrete numerical examples. Do not nest bullet lists inside numbered lists -- keep list items flat with inline descriptions.

IMPORTANT: Respond with ONLY a valid JSON object, no other text.

2.4 Guides Section

System Prompt

You are a prediction market researcher writing for Beeks.ai, a prediction market intelligence platform. Write comprehensive market-category deep dives that compare platforms, explain market mechanics, and provide thorough analysis.

Your writing should be:
- Thorough and well-researched, covering the topic comprehensively
- Include platform comparisons where relevant (Polymarket, Kalshi, etc.)
- Explain market structures, resolution criteria, and unique characteristics
- Useful as a reference guide readers can return to

{shared JSON output fields}

content: The full article body in markdown (1500-2000 words). Use ## headings, **bold**, ordered/unordered lists, and blockquotes for key insights. Include markdown tables for platform comparisons, fee structures, or feature matrices. Be comprehensive -- this is reference material. Do not nest bullet lists inside numbered lists -- keep list items flat with inline descriptions.

IMPORTANT: Respond with ONLY a valid JSON object, no other text.

Article Generation Pipeline

Article generation is a separate pipeline from the Beeks Index. It produces long-form markdown editorial content (600-2000 words depending on section type) stored in the articles table. Articles are optionally linked to a prediction topic via topic_id but do not consume Beeks Index data or ai_generated_content rows as input.

Pipeline Overview

flowchart TD
    subgraph adminUI [Admin Trigger]
        GenerateTab["GenerateTab\n(topic + section)"]
        DraftsTab["DraftsTab\n(regenerate with sourceId)"]
        TopicsTab["TopicsTab\n'Write article' button\n(preselects topic + topicId)"]
    end

    API["POST /api/articles/generate\n{topic?, url?, sourceId?, topicId?, section}"]

    subgraph validation [Input Validation]
        Zod["Zod schema: at least one of\ntopic, url, or sourceId required\nsection: prediction|start|strategy|guides"]
    end

    subgraph sourceMaterial [Source Material Resolution]
        direction TB

        PathA["Path A: topic only\n(no url, no sourceId)"]
        PathB["Path B: url provided"]
        PathC["Path C: sourceId provided"]

        subgraph googleSearch [Apify Google Search]
            SearchQuery["Query: '{topic} prediction markets'\nGoogle Search Scraper\nmaxPagesPerQuery: 1"]
            SearchResults["Up to 3 organic URLs"]
        end

        subgraph webCrawl [Apify Web Crawl]
            ParallelCrawl["Parallel crawl of URLs\nWebsite Content Crawler\ncrawlerType: cheerio\nmaxCrawlPages: 3 per URL"]
            CombinedContent["Concatenate markdown/text\nfrom all crawled pages"]
        end

        LoadSource[("scraped_sources\n(load existing by ID)")]
        SaveSource[("scraped_sources\n(insert new row)")]

        PathA --> SearchQuery --> SearchResults --> ParallelCrawl --> CombinedContent --> SaveSource
        PathB --> ParallelCrawl
        PathC --> LoadSource
    end

    subgraph anthropic [Claude Opus Generation]
        SystemPrompt["System prompt\n(section-specific role +\nJSON output schema +\n'Respond with ONLY valid JSON')"]
        UserMessage["User message:\n'Write an article about: {topic}'\n+ optional source material\n(truncated to 12,000 chars)"]
        ClaudeOpus["Claude Opus 4.6\nmax_tokens: 16,000\nthinking: adaptive"]
        JSONResponse["JSON response:\ntitle, subtitle, excerpt,\ncontent (markdown), keyTakeaways,\nmetaDescription, metaKeywords,\nprimaryKeyword, difficulty,\nreadTime, icon, tags"]
    end

    subgraph titleResolution [Title Resolution]
        TitleCheck{{"topicId set AND\nsection = 'prediction'?"}}
        UseTopicTitle["article.title = topics.title\nsuggested_title = generated title"]
        UseProvidedTopic["article.title = user topic\nsuggested_title = generated title"]
        UseGenerated["article.title = generated title"]

        TitleCheck -->|"Yes"| UseTopicTitle
        TitleCheck -->|"No, but topic provided"| UseProvidedTopic
        TitleCheck -->|"No topic"| UseGenerated
    end

    subgraph persist [Persist to Database]
        SlugGen["Generate slug from title\n+ collision suffix if needed"]
        ArticlesTable[("articles\nstatus = 'draft'\ngenerated_by = 'anthropic'")]
        TopicsFK["topics.id (optional FK)\nvia topicId param"]
        SourceFK["scraped_sources.id (optional FK)\nvia sourceId"]
    end

    GenerateTab --> API
    DraftsTab --> API
    TopicsTab --> GenerateTab

    API --> Zod --> sourceMaterial
    SaveSource --> SystemPrompt
    LoadSource --> SystemPrompt
    sourceMaterial -->|"no scrape results"| SystemPrompt

    SystemPrompt --> ClaudeOpus
    UserMessage --> ClaudeOpus
    ClaudeOpus --> JSONResponse --> TitleCheck

    UseTopicTitle --> SlugGen
    UseProvidedTopic --> SlugGen
    UseGenerated --> SlugGen
    SlugGen --> ArticlesTable
    TopicsFK -.-> ArticlesTable
    SourceFK -.-> ArticlesTable
Loading

Section Types

Each section type has a different system prompt persona and content length target:

Section Persona Word Count Use Case
prediction (default) Analyst/journalist 600-900 Market-specific prediction articles
start Friendly educator 800-1200 Beginner "What is X?" explainers
strategy Market strategist 1000-1500 Tactical trading advice
guides Market researcher 1500-2000 Comprehensive deep dives

Database Schema

Table Key Columns Role
articles id, title, suggested_title, slug, content, section, status, topic_id (FK), source_id (FK), generated_by Published/draft editorial articles
scraped_sources id, url, content, content_length, crawler_type, pages_crawled Raw web crawl content used as source material
topics id, title Optional link for prediction-section articles (title override)

Key Differences from Beeks Index Pipeline

Aspect Beeks Index Articles
Model Claude Sonnet 4.6 Claude Opus 4.6 (with adaptive thinking)
Input data DB context (bulls/bears, description, correlations, news) + live Polymarket data Web-scraped source material (Apify)
Output Structured signals JSON -> deterministic formula -> score Long-form markdown article + metadata JSON
Storage ai_generated_content rows articles + scraped_sources rows
Relationship Tightly coupled to Polymarket events Loosely linked via optional topic_id FK
Trigger Per content type (sequential pipeline) Single API call produces complete article

How the Beeks Index Is Calculated

The Beeks Index is Beeks.ai's proprietary prediction score. It answers the question: "Does our AI model agree with the market price, and if not, where is the edge?"

The score is produced in two stages: an AI estimation stage (LLM) followed by a deterministic formula stage (pure math). The LLM never sees or computes the final number -- it only estimates input signals. The formula is fixed and transparent.


Stage 1: AI Signal Estimation (Claude Sonnet 4.6)

The LLM receives a rich context bundle for a specific prediction market contract:

Input Source
Contract name and YES/NO prices Live Polymarket API
Bulls say / Bears say analysis Previously generated AI content
Market description Previously generated AI content
Related news with excerpts Apify Google Search + article scraping
Market microstructure Live Polymarket (spread, liquidity, volume, 1d/1wk price change)
Correlation/hedge context Previously generated AI content
Sibling contracts and prices Live Polymarket (for multi-outcome events)
Time context Days elapsed vs total days for the event

From this context, the LLM estimates five signal probabilities (each 0-100) that the YES outcome will occur:

Signal Weight What It Represents
baseRate 15% Historical frequency of similar events (e.g., incumbent re-election rate, FDA approval base rate)
leadingIndicators 30% Hard quantitative data: polls, economic indicators, on-chain metrics, filing data
expertConsensus 15% Superforecaster consensus, domain expert estimates, prediction aggregator scores
momentum 10% Direction and velocity of recent probability change (derived from price movement data)
marketImplied 30% The current market price itself as a wisdom-of-crowds signal

The LLM also returns:

  • rationale: Concise overall reasoning
  • divergenceReason: 2-5 sentences comparing its signals to the market price
  • keyFactors: Up to 6 short bullet points of key drivers
  • alternativePick (multi-outcome only): The most undervalued sibling contract with reasoning

Stage 2: Deterministic Formula

The formula is applied in code (beeks-formula.ts) with no randomness or AI involvement.

Step 1: Weighted Composite (pRaw)

pRaw = 0.15 * baseRate
     + 0.30 * leadingIndicators
     + 0.15 * expertConsensus
     + 0.10 * momentum
     + 0.30 * marketImplied

This produces a raw probability estimate on a 0-100 scale. The weights are normalized (they sum to 1.0) and can theoretically be overridden, but defaults are used in practice.

Step 2: Time Adjustment (cTime)

timeRatio = daysElapsed / totalDays
cTime = 1 - (1 - timeRatio) * 0.1

This applies a small confidence discount when the event is far from expiry. Near expiry (timeRatio approaches 1), cTime approaches 1.0 (full confidence). Far from expiry, cTime drops to approximately 0.9 (10% discount). The uncertainty factor of 0.1 is deliberately conservative.

Example: An event 30 days in with 90 total days has timeRatio = 0.33, so cTime = 1 - 0.67 * 0.1 = 0.933.

Step 3: Liquidity Adjustment (cLiq)

cLiq = max(0, 1 - bidAskSpread / 2)

Wider bid-ask spreads reduce confidence. A tight spread of 0.02 gives cLiq = 0.99. A very wide spread of 0.20 gives cLiq = 0.90.

Step 4: Final Score

beeksIndex = clamp(pRaw * cTime * cLiq, 0, 100)

The raw probability is multiplied by both adjustment factors and clamped to the 0-100 range.

Full example:

Input Value
baseRate 65
leadingIndicators 70
expertConsensus 68
momentum 72
marketImplied 67
pRaw = 0.15(65) + 0.30(70) + 0.15(68) + 0.10(72) + 0.30(67)
     = 9.75 + 21.0 + 10.2 + 7.2 + 20.1
     = 68.25

cTime = 1 - (1 - 0.33) * 0.1 = 0.933  (30 of 90 days elapsed)
cLiq  = 1 - 0.04 / 2 = 0.98            (spread of 0.04)

beeksIndex = 68.25 * 0.933 * 0.98 = 62.38

Stage 3: Derived Metrics

Once the Beeks Index score is computed, several trading-relevant metrics are derived:

Edge

edge = beeksIndex - polymarketImplied

If the Beeks Index is 62 and the market says 67, the edge is -5 (the model thinks the market is slightly overpriced on YES).

Edge Band

Absolute Edge Band Interpretation
0-2 none No meaningful disagreement with market
3-5 slight Minor divergence, low confidence
6-12 signal Notable divergence worth investigating
13+ strong Major disagreement -- potential mispricing

Direction

Edge Direction Meaning
Positive YES Model thinks YES is more likely than market implies
Negative NO Model thinks NO is more likely than market implies
Zero NONE Model agrees with market

Kelly Criterion (Position Sizing)

The Kelly criterion calculates the theoretically optimal bet size given the model's edge:

When edge favors YES (beeksIndex > marketPrice):

odds = (1 / pMarket) - 1
kellyFull = (pModel * odds - (1 - pModel)) / odds

When edge favors NO (beeksIndex < marketPrice):

odds = (1 / (1 - pMarket)) - 1
kellyFull = ((1 - pModel) * odds - pModel) / odds

kellyHalf = kellyFull / 2 (half-Kelly is the conservative standard in practice).

Expected Value

When edge favors YES:

EV = pModel * (1 / pMarket) - 1

When edge favors NO:

EV = (1 - pModel) * (1 / (1 - pMarket)) - 1

A positive EV means the bet has a positive expected return according to the model.


Multi-Outcome Markets

For events with multiple outcomes (e.g., "Who will win the 2026 World Cup?"), the pipeline runs the full Stage 1 + Stage 2 process for each of the top 3 outcomes (selected by highest YES price). Results are wrapped in an envelope:

{
  "multiOutcome": true,
  "outcomes": [
    { "marketId": "abc", "label": "Team A", "data": { "beeksIndex": 42.5, "edge": 3.2, ... } },
    { "marketId": "def", "label": "Team B", "data": { "beeksIndex": 28.1, "edge": -1.8, ... } },
    { "marketId": "ghi", "label": "Team C", "data": { "beeksIndex": 18.7, "edge": 5.1, ... } }
  ]
}

For events with >= 5 markets, the LLM is also required to produce an alternativePick -- the sibling contract it considers most undervalued, with reasoning.


What the Beeks Index Is NOT

  • Not a price prediction. It's a probability estimate (0-100) that can be compared to the market price.
  • Not self-updating. It's generated on-demand by an admin. The score is a snapshot that goes stale as market conditions change.
  • Not a trading recommendation. The edge, Kelly, and EV metrics are informational. They reflect model output, not financial advice.
  • Not a black box. The five signals, weights, and formula are fully transparent. The only opaque part is how the LLM estimates the five input signals from the provided context.

Beeks Index Prediction Pipeline

The Beeks Index is Beeks.ai's proprietary prediction score. It combines AI-estimated signal probabilities with a deterministic formula to produce a final score that can diverge from the Polymarket implied probability, identifying potential edge.

Pipeline Overview

The pipeline has prerequisite steps (news, description, correlations) that feed into the core generation steps (bulls/bears, then Beeks Index). An admin triggers each step manually via the admin dashboard.

flowchart TD
    subgraph trigger [Admin Trigger]
        AdminUI["Admin Dashboard"]
        AdminAPI["POST /api/admin/topics/{id}/ai-content\n{contentType, yesPrice, noPrice}"]
        NewsAPI["POST /api/admin/topics/{id}/news"]
    end

    subgraph prerequisites [Prerequisite Steps]
        direction TB

        subgraph newsIngestion [1. News Ingestion]
            ApifySearch["Apify Google Search Scraper\nquery: '{title} news recent'"]
            TopicNewsTable[("topic_news\n(per-topic news rows)")]
            ExcerptFetch["Guarded Article Fetch\n(up to 4 URLs)"]
            ApifySearch --> TopicNewsTable
            TopicNewsTable --> ExcerptFetch
            ExcerptFetch -->|"article_excerpt"| TopicNewsTable
        end

        subgraph descGen [2. Description Generation]
            DescPrompt["Claude Sonnet 4.6\nDESCRIPTION_SYSTEM prompt\n(title + YES/NO prices)"]
            DescStore[("ai_generated_content\ncontent_type = 'description'")]
            DescPrompt --> DescStore
        end

        subgraph corrGen [3. Correlations Generation]
            AllTopics["Fetch all active topics\nwith current prices"]
            CorrPrompt["Claude Sonnet 4.6\nCORRELATIONS_SYSTEM prompt\n(all topics as input)"]
            CorrStore[("ai_generated_content\ncontent_type = 'market_correlations'")]
            AllTopics --> CorrPrompt --> CorrStore
        end
    end

    subgraph bullsBears [4. Bulls/Bears Generation]
        direction TB
        PolyFetch1["Fetch live Polymarket event\n(Gamma API)"]
        BBContextLoad["Load context from DB:\n- description\n- correlations JSON\n- topic_news rows"]
        BBMarketData["Attach market microstructure:\nspread, liquidity, volume,\n1d/1wk price change"]

        subgraph bbBranch [Binary vs Multi-Outcome]
            BBBinary["Single market:\nOne Claude Sonnet call\n-> {bullCase, bearCase}"]
            BBMulti["Multi-outcome (>1 top market):\nTop 3 markets by YES price\nSequential loop:"]
            BBLoop["For each market:\n1. Load context + market data\n2. Claude Sonnet call\n3. Upsert per-outcome row"]
            BBDefault["Copy featured outcome\nto default row\n(polymarket_market_id = '')"]
        end

        BBStore[("ai_generated_content\ncontent_type = 'bulls_bears'\n(per-outcome + default rows)")]

        PolyFetch1 --> BBContextLoad --> BBMarketData
        BBMarketData -->|"<=1 top market"| BBBinary --> BBStore
        BBMarketData -->|">1 top market"| BBMulti --> BBLoop --> BBDefault --> BBStore
    end

    subgraph beeksIndex [5. Beeks Index Generation]
        direction TB
        PolyFetch2["Fetch live Polymarket event\n(Gamma API)"]
        BIContextLoad["Load full context bundle:\n- bulls/bears from DB\n- description from DB\n- correlations from DB\n- topic_news from DB\n- live market microstructure\n- sibling contracts"]

        subgraph biBranch [Binary vs Multi-Outcome]
            BIBinary["Single market:\nOne Claude Sonnet call\n-> 5 signal probabilities\n-> Deterministic formula"]
            BIMulti["Multi-outcome (>1 top market):\nTop 3 markets by YES price\nSequential loop:"]
            BILoop["For each market:\n1. Load full context bundle\n2. Claude Sonnet call (5 signals)\n3. computeBeeksIndex()\n4. Append to outcomes[]"]
            BIEnvelope["Wrap in envelope:\n{multiOutcome: true,\n outcomes: [{marketId, label, data}]}"]
        end

        subgraph formula [Deterministic Formula]
            Signals["LLM Output: 5 Signals (0-100)\nbaseRate, leadingIndicators,\nexpertConsensus, momentum,\nmarketImplied"]
            WeightedSum["pRaw = weighted sum\n0.15*base + 0.30*leading +\n0.15*expert + 0.10*momentum +\n0.30*market"]
            TimeFactor["cTime = 1 - (1 - timeRatio) * 0.1\ntimeRatio = daysElapsed / totalDays"]
            LiqFactor["cLiq = max(0, 1 - spread/2)"]
            FinalScore["beeksIndex = pRaw * cTime * cLiq\n(clamped 0-100)"]
            Derived["edge = beeksIndex - polymarketImplied\nedgeBand: none|slight|signal|strong\ndirection: YES|NO|NONE\nkellyFull, kellyHalf, expectedValue"]
            Signals --> WeightedSum --> FinalScore
            TimeFactor --> FinalScore
            LiqFactor --> FinalScore
            FinalScore --> Derived
        end

        BIStore[("ai_generated_content\ncontent_type = 'beeks_index'\n(single row, polymarket_market_id = '')")]
        SleepFlag["topics.show_sleeper_pick = true"]

        PolyFetch2 --> BIContextLoad
        BIContextLoad -->|"<=1 top market"| BIBinary --> Signals
        BIContextLoad -->|">1 top market"| BIMulti --> BILoop --> Signals
        Derived --> BIStore
        BILoop --> BIEnvelope --> BIStore
        BIStore --> SleepFlag
    end

    AdminUI --> AdminAPI
    AdminUI --> NewsAPI
    NewsAPI --> ApifySearch
    AdminAPI -->|"contentType = 'description'"| DescPrompt
    AdminAPI -->|"contentType = 'bulls_bears'"| PolyFetch1
    AdminAPI -->|"contentType = 'beeks_index'"| PolyFetch2

    TopicNewsTable -.->|"news context"| BBContextLoad
    TopicNewsTable -.->|"news context"| BIContextLoad
    DescStore -.->|"description"| BBContextLoad
    DescStore -.->|"description"| BIContextLoad
    CorrStore -.->|"correlations"| BBContextLoad
    CorrStore -.->|"correlations"| BIContextLoad
    BBStore -.->|"bull/bear cases"| BIContextLoad
Loading

Intended Pipeline Order

The admin should run steps in this order for best results (each step feeds context to later ones):

1. News ingestion (Apify search + article excerpts)
2. Description (Claude Sonnet - uses title + prices only)
3. Correlations (Claude Sonnet - uses all topics)
4. Bulls/Bears (Claude Sonnet - uses description + correlations + news + market data)
5. Beeks Index (Claude Sonnet + formula - uses ALL of the above)

Database Schema

Table Key Columns Role
topics id, title, polymarket_event_id, polymarket_market_id, show_sleeper_pick Canonical prediction topic linked to Polymarket
topic_news topic_id (FK), title, url, snippet, article_excerpt News research corpus per topic
ai_generated_content topic_id (FK), content_type (enum), polymarket_market_id, content (JSON text) All AI-generated content. Unique on (topic_id, content_type, polymarket_market_id)

ai_generated_content content_type values

content_type polymarket_market_id Content
description '' (empty) Short market description string
market_correlations '' (empty) JSON of correlated/hedge markets
bulls_bears '' (default) or specific market ID JSON {bullCase, bearCase}
beeks_index '' (empty) Single V2 JSON or multi-outcome envelope

Multi-Outcome vs Binary Markets

Aspect Binary (<=1 top market) Multi-Outcome (>1 top market)
Bulls/Bears 1 Claude call, 1 DB row Up to 3 Claude calls (sequential), per-outcome rows + default row
Beeks Index 1 Claude call + formula, single V2 JSON Up to 3 Claude calls + formulas (sequential), wrapped in {multiOutcome: true, outcomes: [...]}
Top N selection N/A selectTopNMarkets(markets, 3) -- top 3 by YES price
isMultiOutcome flag (for LLM) false true when event has >= 5 markets (triggers alternativePick requirement)

Models and Parameters

Step Model max_tokens temperature
Description claude-sonnet-4-6 256 0.5
Correlations claude-sonnet-4-6 8192 0.5
Bulls/Bears claude-sonnet-4-6 512 0.7
Beeks Index claude-sonnet-4-6 1500 0.35

Market Correlations Pipeline

The correlations pipeline identifies which prediction markets move together (correlated) and which move in opposite directions (hedges). It runs across all active topics at once in a single LLM call, then fans out the results to per-topic rows in the database. Correlations are consumed downstream by the Bulls/Bears and Beeks Index generators as additional context.

Pipeline Overview

flowchart TD
    subgraph trigger [Admin Trigger]
        AdminUI["Admin Dashboard\n(Correlations tab)"]
        API["POST /api/admin/topics/correlations/generate\n(no request body needed)"]
    end

    subgraph fetchPhase [1. Fetch All Active Topics + Live Prices]
        QueryDB["Query topics table\nWHERE status = 'active'\nAND deleted_at IS NULL"]
        MinCheck{{"activeTopics.length >= 2?"}}
        Reject400["400: Need at least 2\nactive topics"]

        subgraph polyFetch [Parallel Polymarket Fetch]
            PromiseAll["Promise.all(\n  topics.map(t =>\n    fetchGammaEvent(t.eventId)\n  )\n)"]
            Transform["For each event:\ntransformGammaEventToDynamic()\nselectFeaturedMarket()"]
            BuildInputs["Build TopicInput[]\n{id, title, yesPrice, noPrice}\n+ priceMap for snapshot"]
        end
    end

    subgraph manualCheck [2. Check for Manual Edits]
        QueryEdited["Query ai_generated_content\nWHERE content_type = 'market_correlations'\nAND manually_edited IS NOT NULL"]
        EditedSet["Build Set of topic IDs\nwith manual edits\n(these will be skipped)"]
    end

    subgraph llmPhase [3. Single LLM Call]
        FormatInput["Format all topics as list:\n- ID: uuid | Title: '...' | YES: N¢ | NO: N¢"]
        ClaudeSonnet["Claude Sonnet 4.6\nmax_tokens: 8192\ntemperature: 0.5\nCORRELATIONS_SYSTEM prompt"]
        ParseJSON["Parse JSON response\nFilter to valid topic IDs only"]
    end

    subgraph storePhase [4. Fan Out Results to DB]
        LoopTopics["For each TopicInput:"]
        SkipCheck{{"Manually edited?"}}
        SkipTopic["Skip\n(preserve manual edit)"]
        UpsertRow["Upsert ai_generated_content\ncontent_type = 'market_correlations'\ncontent = JSON string\nsnapshot prices from live data"]
    end

    subgraph output [Response]
        Response["200: {generated: N, skipped: N}"]
    end

    AdminUI --> API
    API --> QueryDB --> MinCheck
    MinCheck -->|"< 2 topics"| Reject400
    MinCheck -->|">= 2 topics"| PromiseAll
    PromiseAll --> Transform --> BuildInputs

    BuildInputs --> QueryEdited --> EditedSet

    EditedSet --> FormatInput --> ClaudeSonnet --> ParseJSON

    ParseJSON --> LoopTopics --> SkipCheck
    SkipCheck -->|"Yes"| SkipTopic
    SkipCheck -->|"No"| UpsertRow
    SkipTopic --> Response
    UpsertRow --> Response
Loading

Detailed Data Flow

flowchart LR
    subgraph inputs [Inputs to LLM]
        T1["Topic A\nYES: 72¢"]
        T2["Topic B\nYES: 45¢"]
        T3["Topic C\nYES: 88¢"]
        TN["Topic N\nYES: ..."]
    end

    subgraph llm [Claude Sonnet 4.6]
        Analysis["Analyze all pairwise\nrelationships between\nmarkets"]
    end

    subgraph perTopic [Output Per Topic]
        direction TB
        CorrA["Topic A correlations:\ncorrelatedMarkets: [\n  {topicId: B, coeff: 0.85, reason: '...'}\n]\nhedgeMarkets: [\n  {topicId: C, coeff: -0.72, reason: '...'}\n]"]
        CorrB["Topic B correlations:\ncorrelatedMarkets: [...]\nhedgeMarkets: [...]"]
        CorrC["Topic C correlations:\ncorrelatedMarkets: [...]\nhedgeMarkets: [...]"]
    end

    subgraph db [Database Storage]
        Row1[("ai_generated_content\ntopic_id = A\ncontent_type = 'market_correlations'")]
        Row2[("ai_generated_content\ntopic_id = B\ncontent_type = 'market_correlations'")]
        Row3[("ai_generated_content\ntopic_id = C\ncontent_type = 'market_correlations'")]
    end

    subgraph downstream [Consumed By]
        BB["Bulls/Bears Generator\n(correlationsNote in context)"]
        BI["Beeks Index Generator\n(correlationsNote in context)"]
    end

    T1 & T2 & T3 & TN --> Analysis
    Analysis --> CorrA & CorrB & CorrC
    CorrA --> Row1
    CorrB --> Row2
    CorrC --> Row3
    Row1 & Row2 & Row3 -.->|"truncated to 6000 chars"| BB
    Row1 & Row2 & Row3 -.->|"truncated to 6000 chars"| BI
Loading

Key Behaviors

Symmetric Relationships

The prompt instructs the LLM to make correlations symmetric: if Topic A correlates with Topic B, then Topic B should also list Topic A as correlated. This is enforced by prompt instruction, not code.

Manual Edit Protection

Before writing results, the pipeline checks for rows where manually_edited IS NOT NULL. These topics are skipped -- their hand-tuned correlations are preserved. The response reports how many were skipped.

Result Filtering

After parsing the LLM response, the code filters out any topic IDs that aren't in the input set. This guards against hallucinated IDs.

Downstream Consumption

Both loadBullsBearGenerationInput() and loadBeeksIndexGenerationInput() read correlations from the DB and include them as a correlationsNote string, truncated to 6,000 characters. The note is prefixed with:

Stored correlations JSON (may be large; use for cross-market context only):

Correlation Entry Schema

Each entry in the correlatedMarkets or hedgeMarkets array:

{
  "topicId": "uuid",
  "title": "Will the Fed cut rates?",
  "coefficient": 0.85,
  "reasoning": "Both track Federal Reserve monetary policy expectations"
}
Field Range Meaning
coefficient 0.3 to 1.0 Positively correlated (move together)
coefficient -1.0 to -0.3 Hedge / inverse correlation (move opposite)
Max entries 4 correlated + 4 hedges Per topic

Model Configuration

Parameter Value
Model claude-sonnet-4-6
max_tokens 8192
temperature 0.5
System prompt CORRELATIONS_SYSTEM (see anthropic-prompts.md)

Source Files

File Purpose
frontend/app/api/admin/topics/correlations/generate/route.ts API route -- orchestration, DB reads/writes, manual edit protection
frontend/src/lib/ai/generate-topic-content.ts generateMarketCorrelations() -- LLM call + response parsing
frontend/src/lib/queries/beeks-index-context.ts Downstream consumers that load correlations as context
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment