Skip to content

Instantly share code, notes, and snippets.

@sshh12
Created March 1, 2026 00:14
Show Gist options
  • Select an option

  • Save sshh12/e352c053627ccbe1636781f73d6d715b to your computer and use it in GitHub Desktop.

Select an option

Save sshh12/e352c053627ccbe1636781f73d6d715b to your computer and use it in GitHub Desktop.
Claude for Chrome Extension Internals (v1.0.56)

Claude for Chrome Extension Internals (v1.0.56)

A deep dive into how Anthropic's Claude for Chrome extension works under the hood, based on reading the extension's source code.

Architecture Overview

The extension is a Chrome Manifest V3 extension built with React, using the Anthropic JavaScript SDK directly in the browser. It opens as a side panel alongside the active tab and acts as an AI agent that can see and interact with web pages.

The Anthropic JS SDK is instantiated in the browser with dangerouslyAllowBrowser: true, authenticating via either OAuth PKCE (default, scopes: user:profile user:inference) or a manual API key (feature-gated for internal use).

The Agentic Loop

The extension operates in two distinct modes:

Standard Mode

When the user types a prompt, the side panel runs an agentic tool-use loop:

  1. Build system prompt: Loaded from server config (chrome_ext_system_prompt). Appends platform info (Mac/Windows), current tab context (URLs, titles, tab IDs), and any domain-specific skills.
  2. Resolve available tools: Based on page type (system pages only get navigate, normal pages get all tools). Builds the full tool definitions array.
  3. Stream response: Calls claude-sonnet-4-5-20250929 via beta.messages.stream() with betas oauth-2025-04-20, max_tokens: 10000.
  4. Tool execution loop: If the response contains tool_use blocks:
    • Check permissions via the PermissionManager.
    • Execute the tool.
    • Feed tool_result messages back to the API.
    • Inject updated tab context as <system-reminder> if tabs changed.
    • Loop continues until Claude returns a response with no tool calls.
  5. Cleanup: Detaches the CDP debugger from all tabs.

If the user cancels mid-loop (Stop Claude button), all pending tools receive error tool_result responses.

User prompt
  |
  v
Build system prompt (server config + platform + tab context + domain skills)
  |
  v
Resolve available tools for current page type
  |
  v
Stream claude-sonnet-4-5 response (max_tokens: 10000)
  |
  v
tool_use block in response?
  |-- yes --> checkPermission(domain, toolUseId)
  |             |-- not granted, needs prompt --> show permission dialog --> wait
  |             |-- granted --> execute tool
  |             |-- denied --> return error tool_result
  |           |
  |           v
  |         tool_result --> inject tab context update --> feed back to API --> loop
  |
  |-- no --> done, detach debugger

Quick Mode

A fast, low-latency mode that bypasses the standard tool-use protocol entirely. Instead of sending tool definitions to the API, it uses a compact command language with single-letter commands:

  • Sends tools: [] (no API tools) with stop_sequences: ["\n<<END>>"]
  • Supports an effort parameter ("high", "medium", "low", "none") via the effort-2025-11-24 beta
  • When the model name includes [fast], adds speed: "fast" and the fast-mode-2026-02-01 beta

The model outputs compact commands that the extension parses and executes directly:

Command Action Args
C x y Left click coordinates
RC x y Right click coordinates
DC x y Double click coordinates
TC x y Triple click coordinates
H x y Hover coordinates
T text Type text multi-line text
K keys Press keys space-separated key names
S dir amt x y Scroll direction, amount, coordinates
D x1 y1 x2 y2 Drag start to end coordinates
Z x1 y1 x2 y2 Zoom (screenshot region) bounding box
N url Navigate URL or "back" / "forward"
J code Execute JavaScript multi-line code
W Wait for page settle none
ST tabId Switch tab tab ID
NT url New tab URL
LT List tabs none
PL json Present plan JSON with domains/approach

After executing each batch of commands, the extension takes a screenshot and feeds it back as a user message. The loop continues until the response contains no commands.

Tools (21 Total)

computer

Full mouse, keyboard, and screenshot control via Chrome DevTools Protocol (CDP).

This uses Anthropic's built-in computer_20250124 tool type, so the model natively understands its parameters. The display_width_px and display_height_px are set dynamically from the actual viewport dimensions.

{
  "name": "computer",
  "type": "computer_20250124",
  "display_width_px": "<dynamic>",
  "display_height_px": "<dynamic>",
  "display_number": 1
}
Param Type Description
action enum "left_click", "right_click", "double_click", "triple_click", "hover", "type", "key", "screenshot", "wait", "scroll", "scroll_to", "left_click_drag", "zoom"
coordinate [x, y] Pixel coordinates for click/hover actions
text string Text to type (for type) or key combo (for key, e.g. "cmd+a")
duration number (0-30) Seconds to wait (for wait action)
scroll_direction enum "up", "down", "left", "right"
scroll_amount number (1-10) Scroll ticks, default 3
start_coordinate [x, y] Start position for left_click_drag
region [x0, y0, x1, y1] Bounding box for zoom action
repeat number (1-100) Repeat count for key action
ref string Element ref ID from read_page/find. Required for scroll_to, optional alternative to coordinates for clicks.
modifiers string Modifier keys for clicks: "ctrl", "shift", "alt", "cmd", combinable with +
tabId number Required. Tab ID to execute on.

Implementation: All automation is done via Chrome DevTools Protocol (CDP v1.3), attached via chrome.debugger.attach(tabId, "1.3").

  • Click: Input.dispatchMouseEvent with mouseMoved (100ms delay), then mousePressed + mouseReleased. Multi-click loops with incrementing clickCount.
  • Type: Character by character via Input.insertText. Uppercase/special chars apply shift modifier.
  • Key: Parses space-separated key names, supports + chords. Maps modifiers to CDP bitmask (alt=1, ctrl=2, meta=4, shift=8).
  • Screenshot: Page.captureScreenshot (PNG). Downscales for Retina. Applies a resize algorithm optimized for token cost (pxPerToken: 28, maxTargetPx: 1568). Stores scaling context so subsequent coordinate actions map correctly.
  • Scroll: Input.dispatchMouseEvent with mouseWheel, delta = scrollAmount * 100 pixels. Auto-takes a follow-up screenshot.
  • scroll_to: Scrolls an element into view by its ref ID (from read_page/find).
  • hover: Dispatches mouseMoved without clicking.
  • zoom: Captures a cropped screenshot of a specific region.
  • Drag: Sequence of mouseMoved -> mousePressed -> mouseMoved -> mouseReleased.

Before screenshots and clicks, sends HIDE_FOR_TOOL_USE to temporarily hide the Claude overlay, then SHOW_AFTER_TOOL_USE after.

Permission mapping:

Action Permission Type
screenshot, scroll, scroll_to, zoom READ_PAGE_CONTENT
left_click, right_click, double_click, triple_click, hover, left_click_drag CLICK
type, key TYPE

navigate

Navigate to a URL, or go forward/back in browser history.

{
  "name": "navigate",
  "input_schema": {
    "properties": {
      "url": {
        "type": "string",
        "description": "URL to navigate to. Use 'forward'/'back' for history navigation."
      },
      "tabId": { "type": "number" }
    },
    "required": ["url", "tabId"]
  }
}
  • Queries api.anthropic.com/api/web/domain_info/browser_extension?domain=... to classify the target domain. category1/category2 domains are blocked.
  • "back"/"forward" use chrome.tabs.goBack() / chrome.tabs.goForward().
  • URLs without a protocol prefix get https:// prepended.
  • Execution: chrome.tabs.update(tabId, { url }).

read_page

Get an accessibility tree representation of elements on the page.

{
  "name": "read_page",
  "description": "Get an accessibility tree representation of elements on the page. By default returns all elements including non-visible ones. Output is limited to 50000 characters.",
  "input_schema": {
    "properties": {
      "filter": {
        "type": "string",
        "enum": ["interactive", "all"],
        "description": "Filter: 'interactive' for buttons/links/inputs only, 'all' for all elements"
      },
      "tabId": { "type": "number" },
      "depth": {
        "type": "number",
        "description": "Max tree depth (default: 15)"
      },
      "ref_id": {
        "type": "string",
        "description": "Reference ID of a parent element to scope the read to (e.g. 'ref_5')"
      },
      "max_chars": {
        "type": "number",
        "description": "Max characters for output (default: 50000)"
      }
    },
    "required": ["tabId"]
  }
}

Injects window.__generateAccessibilityTree(filter, depth, maxChars, refId) via chrome.scripting.executeScript. The generator:

  • Walks the DOM recursively up to depth levels (default 15).
  • Maps elements to ARIA roles (e.g. <a> -> link, <button> -> button, <input type="text"> -> textbox, <select> -> combobox).
  • Extracts accessible names from: aria-label, placeholder, title, alt, associated <label>, text content.
  • Each element gets a persistent ref ID (ref_1, ref_2, ...) stored in window.__claudeElementMap using WeakRef. These refs are reused by form_input, find, computer, upload_image, and file_upload.
  • When ref_id is provided, scopes the tree walk to that element and its descendants (useful for large pages).
  • When output exceeds max_chars, returns an error suggesting a smaller depth or a ref_id to narrow scope.
  • For <select> elements, also lists all <option> children with their text, selected state, and value.

Output format (indented tree):

role "accessible name" [ref_1] href="/bar" type="text" placeholder="Search..."
  option "Option A" (selected) value="a"
  option "Option B" value="b"

form_input

Set values in form elements using element reference IDs from read_page.

{
  "name": "form_input",
  "input_schema": {
    "properties": {
      "ref": {
        "type": "string",
        "description": "Element reference ID from read_page (e.g. 'ref_1')"
      },
      "value": {
        "type": ["string", "boolean", "number"],
        "description": "Value to set. Boolean for checkboxes, option value/text for selects."
      },
      "tabId": { "type": "number" }
    },
    "required": ["ref", "value", "tabId"]
  }
}
  • Looks up the DOM element via window.__claudeElementMap[ref].deref().
  • Scrolls element into view, then sets value based on element type (<select>, checkbox, radio, date/time, range, number, text/textarea).
  • Dispatches both change and input events with { bubbles: true } to trigger framework reactivity.
  • Verifies the page domain hasn't changed mid-action (prevents cross-domain attacks).

find

Find elements on the page using natural language. Uses a nested LLM call internally.

{
  "name": "find",
  "description": "Find elements on the page using natural language. Can search by purpose (e.g. 'search bar') or text content. Returns up to 20 matching elements with references and coordinates.",
  "input_schema": {
    "properties": {
      "query": {
        "type": "string",
        "description": "Natural language description of what to find"
      },
      "tabId": { "type": "number" }
    },
    "required": ["query", "tabId"]
  }
}

Two-stage LLM-in-the-loop architecture:

  1. Injects __generateAccessibilityTree("all") to get the full a11y tree (not viewport-filtered).
  2. Sends the tree to claude-sonnet-4-5 (max_tokens: 800) asking it to match elements semantically.
  3. Inner model returns structured results: FOUND: N / SHOWING: M, then pipe-delimited rows of ref | role | name | type | x,y | reason.
  4. Parses and returns up to 20 elements. If more than 20 match, suggests refining the query.

So: Claude (outer) asks find("login button") -> extension gets full a11y tree -> Claude (inner) semantically matches -> returns refs to Claude (outer).


get_page_text

Extract raw text content from the page, prioritizing article content.

{
  "name": "get_page_text",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "max_chars": {
        "type": "number",
        "description": "Max characters for output (default: 50000)"
      }
    },
    "required": ["tabId"]
  }
}

Searches for content using a priority-ordered selector list: article, main, [class*="articleBody"], [class*="article-body"], [class*="post-content"], [class*="entry-content"], [class*="content-body"], [role="main"], .content, #content. Picks the element with the longest textContent if multiple match.

Returns:

Title: {document.title}
URL: {window.location.href}
Source element: <{tagName}>
---
{cleaned text}

javascript_tool

Execute JavaScript code in the context of the current page.

{
  "name": "javascript_tool",
  "description": "Execute JavaScript code in the context of the current page. The code runs in the page's context and can interact with the DOM, window object, and page variables.",
  "input_schema": {
    "properties": {
      "action": {
        "type": "string",
        "description": "Must be set to 'javascript_exec'"
      },
      "text": {
        "type": "string",
        "description": "JavaScript code to execute. Result of last expression returned automatically."
      },
      "tabId": { "type": "number" }
    },
    "required": ["action", "text", "tabId"]
  }
}

tabs_context

Get context information about all tabs in the current tab group.

{
  "name": "tabs_context",
  "input_schema": { "properties": {}, "required": [] }
}

Returns tab IDs, URLs, and titles for all tabs in the session's Chrome Tab Group.


tabs_create

Create a new empty tab in the current tab group.

{
  "name": "tabs_create",
  "input_schema": { "properties": {}, "required": [] }
}

read_console_messages

Read browser console messages from a specific tab.

{
  "name": "read_console_messages",
  "description": "Read browser console messages (console.log, console.error, console.warn, etc.) from a specific tab. Returns console messages from the current domain only.",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "onlyErrors": { "type": "boolean", "description": "If true, only return error messages" },
      "clear": { "type": "boolean", "description": "If true, clear after reading" },
      "pattern": { "type": "string", "description": "Regex pattern to filter messages" },
      "limit": { "type": "number", "description": "Max messages to return (default: 100)" }
    },
    "required": ["tabId"]
  }
}

read_network_requests

Read HTTP network requests from a specific tab.

{
  "name": "read_network_requests",
  "description": "Read HTTP network requests (XHR, Fetch, documents, images, etc.) from a specific tab.",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "urlPattern": { "type": "string", "description": "URL pattern to filter requests" },
      "clear": { "type": "boolean" },
      "limit": { "type": "number", "description": "Max requests (default: 100)" }
    },
    "required": ["tabId"]
  }
}

upload_image

Upload a previously captured screenshot or user-uploaded image to a file input or drag-and-drop target.

{
  "name": "upload_image",
  "input_schema": {
    "properties": {
      "imageId": { "type": "string", "description": "ID of captured screenshot or user-uploaded image" },
      "ref": { "type": "string", "description": "Element reference ID for file inputs" },
      "coordinate": { "type": "array", "description": "[x, y] for drag & drop targets" },
      "tabId": { "type": "number" },
      "filename": { "type": "string", "description": "Optional filename (default: 'image.png')" }
    },
    "required": ["imageId", "tabId"]
  }
}

file_upload

Upload files from the local filesystem to a file input element.

{
  "name": "file_upload",
  "description": "Upload one or multiple files from the local filesystem to a file input element on the page. Do not click on file upload buttons -- clicking opens a native file picker dialog that you cannot see.",
  "input_schema": {
    "properties": {
      "paths": { "type": "array", "items": { "type": "string" }, "description": "Absolute file paths" },
      "ref": { "type": "string", "description": "Element reference ID of the file input" },
      "tabId": { "type": "number" }
    },
    "required": ["paths", "ref", "tabId"]
  }
}

resize_window

Resize the current browser window.

{
  "name": "resize_window",
  "input_schema": {
    "properties": {
      "width": { "type": "number" },
      "height": { "type": "number" },
      "tabId": { "type": "number" }
    },
    "required": ["width", "height", "tabId"]
  }
}

gif_creator

Record browser actions and export as animated GIF.

{
  "name": "gif_creator",
  "description": "Manage GIF recording and export for browser automation sessions. Control when to start/stop recording browser actions, then export as an animated GIF with visual overlays.",
  "input_schema": {
    "properties": {
      "action": {
        "type": "string",
        "enum": ["start_recording", "stop_recording", "export", "clear"]
      },
      "tabId": { "type": "number" },
      "coordinate": { "type": "array", "description": "[x, y] for drag & drop upload after export" },
      "download": { "type": "boolean" },
      "filename": { "type": "string" },
      "options": {
        "type": "object",
        "properties": {
          "showClickIndicators": { "type": "boolean" },
          "showDragPaths": { "type": "boolean" },
          "showActionLabels": { "type": "boolean" },
          "showProgressBar": { "type": "boolean" },
          "showWatermark": { "type": "boolean" },
          "quality": { "type": "number", "description": "1-30, lower = better quality" }
        }
      }
    },
    "required": ["action", "tabId"]
  }
}

update_plan

Present a plan to the user for approval before taking actions. Used in "follow a plan" permission mode.

{
  "name": "update_plan",
  "input_schema": {
    "properties": {
      "domains": {
        "type": "array",
        "items": { "type": "string" },
        "description": "List of domains to visit (approved when user accepts)"
      },
      "approach": {
        "type": "array",
        "items": { "type": "string" },
        "description": "High-level description of actions. 3-7 items."
      }
    },
    "required": ["domains", "approach"]
  }
}

shortcuts_list

List all available shortcuts and workflows.

{
  "name": "shortcuts_list",
  "description": "List all available shortcuts and workflows. Returns shortcuts with their commands, descriptions, and whether they are workflows. Use shortcuts_execute to run one.",
  "input_schema": { "properties": {}, "required": [] }
}

shortcuts_execute

Execute a shortcut or workflow.

{
  "name": "shortcuts_execute",
  "description": "Execute a shortcut or workflow by running it in a new sidepanel window using the current tab.",
  "input_schema": {
    "properties": {
      "shortcutId": { "type": "string" },
      "command": { "type": "string", "description": "Command name (e.g. 'debug', 'summarize'). No leading slash." }
    },
    "required": []
  }
}

turn_answer_start

Internal control tool. Called immediately before the model's text response each turn.

{
  "name": "turn_answer_start",
  "description": "Call this immediately before your text response to the user for this turn. Required every turn.",
  "input_schema": { "properties": {}, "required": [] }
}

MCP Tools: tabs_context_mcp and tabs_create_mcp

When connected to Claude Desktop or Claude Code via native messaging, MCP tools operate in a separate "MCP tab group" distinct from the sidepanel's tab group.

  • tabs_context_mcp: Get context for the MCP tab group. Has a createIfEmpty boolean parameter to auto-create a group.
  • tabs_create_mcp: Create a new tab in the MCP tab group.

Tool Availability by Page Type

Page Type Available Tools
Normal web pages All tools
chrome://, chrome-extension://, about:blank navigate only
chromewebstore.google.com navigate only (no script injection)

Permission Model

Every tool execution goes through a PermissionManager singleton.

Permission Modes

Mode Behavior
"ask" (default) Prompt user before each action on a new domain
"follow_a_plan" Claude must submit a plan via update_plan first; once approved, listed domains are pre-authorized
"skip_all_permission_checks" Auto-approve everything

Permission Types

Type Triggered By
NAVIGATE navigate tool
READ_PAGE_CONTENT read_page, computer (screenshot, scroll, scroll_to, zoom)
CLICK computer (left_click, right_click, double_click, triple_click, hover, drag)
TYPE computer (type, key)
UPLOAD_IMAGE upload_image
PLAN_APPROVAL update_plan tool
REMOTE_MCP Remote MCP tool execution
DOMAIN_TRANSITION Navigating from one domain to another

Permission Durations

  • once: Single use, tied to a specific toolUseId. Auto-revoked after use.
  • always: Persistent for that domain until manually revoked.

Domain Safety Classification

Queries api.anthropic.com/api/web/domain_info/browser_extension?domain=... to classify domains. Results cached for 5 minutes.

Category Behavior
category0 No restrictions
category1 / category2 Fully blocked (page replaced with blocked.html)
category3 Force-prompt mode: always asks, no "always allow" option
category_org_blocked Blocked by the user's organization

Security: URL Verification

Before any mutating action (form_input, click, type, key, drag), verifies that the tab's current URL domain still matches the original. Prevents attacks where the page navigates mid-action.

Tab Group Management

The extension manages Chrome Tab Groups to organize tabs per conversation session:

  • Each session has a main tab (where the user opened the side panel) and optional secondary tabs (created by the agent via tabs_create).
  • Tab groups are tracked in chrome.storage.local via a TabGroupManager.
  • Tab context (all tab IDs, URLs, titles in the group) is injected as <system-reminder> in tool results when tabs change.
  • Visual indicators on the tab group: loading animation (animated dots) while the agent works, checkmark when done.
  • The tabs_context tool returns all tabs in the group so Claude knows which tabId to target.

MCP Integration

The extension connects to external MCP servers via two channels:

Native Messaging (Claude Desktop / Claude Code)

Connects via chrome.runtime.connectNative() to native messaging hosts:

  • com.anthropic.claude_browser_extension (Claude Desktop)
  • com.anthropic.claude_code_browser_extension (Claude Code)

Protocol:

  • ping/pong for connection handshake
  • get_status for status checks
  • tool_request with { method: "execute_tool", params: { tool, args, tabGroupId, tabId, client_id } }
  • tool_response with { result: { content } } or { error: { content } }
  • mcp_connected/mcp_disconnected lifecycle events

When connected, Claude Desktop/Code can invoke browser tools (navigate, click, screenshot, etc.) on the user's browser tabs.

Remote MCP Servers

Managed via a Zustand store (mcpServersStore). Remote MCP tools are fetched via the standard MCP tools/list protocol, executed via tools/call, and gated behind the REMOTE_MCP permission type.

Workflow Recording

The extension can record user browser actions and convert them into reusable "shortcuts":

  • Uses rrweb for session recording (DOM snapshots, mouse events)
  • Captures screenshots with click position overlays
  • Supports speech narration: users can speak while recording, and the narration is transcribed and used as the primary intent signal
  • Workflow steps are described by claude-haiku-4-5 (small model for low-latency step descriptions)
  • Workflow summary generated by Claude with screenshot analysis
  • Output: a reusable saved prompt with <inputs> for parameterization
  • Saved prompts can be scheduled for recurring execution via chrome.alarms (daily, weekly, monthly, annually)

Message Compaction

For long sessions, the extension manages message size:

  • Threshold: ~25MB
  • When exceeded: extracts base64 images from older messages and replaces them with placeholders
  • Tracks tokensSaved for display in the UI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment