Claude for Chrome Extension Internals (v1.0.56)

A deep dive into how Anthropic's Claude for Chrome extension works under the hood, based on reading the extension's source code.

Architecture Overview

The extension is a Chrome Manifest V3 extension built with React, using the Anthropic JavaScript SDK directly in the browser. It opens as a side panel alongside the active tab and acts as an AI agent that can see and interact with web pages.

The Anthropic JS SDK is instantiated in the browser with dangerouslyAllowBrowser: true, authenticating via either OAuth PKCE (default, scopes: user:profile user:inference) or a manual API key (feature-gated for internal use).

The Agentic Loop

The extension operates in two distinct modes:

Standard Mode

When the user types a prompt, the side panel runs an agentic tool-use loop:

Build system prompt: Loaded from server config (chrome_ext_system_prompt). Appends platform info (Mac/Windows), current tab context (URLs, titles, tab IDs), and any domain-specific skills.
Resolve available tools: Based on page type (system pages only get navigate, normal pages get all tools). Builds the full tool definitions array.
Stream response: Calls claude-sonnet-4-5-20250929 via beta.messages.stream() with betas oauth-2025-04-20, max_tokens: 10000.
Tool execution loop: If the response contains tool_use blocks:
- Check permissions via the PermissionManager.
- Execute the tool.
- Feed tool_result messages back to the API.
- Inject updated tab context as <system-reminder> if tabs changed.
- Loop continues until Claude returns a response with no tool calls.
Cleanup: Detaches the CDP debugger from all tabs.

If the user cancels mid-loop (Stop Claude button), all pending tools receive error tool_result responses.

User prompt
  |
  v
Build system prompt (server config + platform + tab context + domain skills)
  |
  v
Resolve available tools for current page type
  |
  v
Stream claude-sonnet-4-5 response (max_tokens: 10000)
  |
  v
tool_use block in response?
  |-- yes --> checkPermission(domain, toolUseId)
  |             |-- not granted, needs prompt --> show permission dialog --> wait
  |             |-- granted --> execute tool
  |             |-- denied --> return error tool_result
  |           |
  |           v
  |         tool_result --> inject tab context update --> feed back to API --> loop
  |
  |-- no --> done, detach debugger

Quick Mode

A fast, low-latency mode that bypasses the standard tool-use protocol entirely. Instead of sending tool definitions to the API, it uses a compact command language with single-letter commands:

Sends tools: [] (no API tools) with stop_sequences: ["\n<<END>>"]
Supports an effort parameter ("high", "medium", "low", "none") via the effort-2025-11-24 beta
When the model name includes [fast], adds speed: "fast" and the fast-mode-2026-02-01 beta

The model outputs compact commands that the extension parses and executes directly:

Command	Action	Args
`C x y`	Left click	coordinates
`RC x y`	Right click	coordinates
`DC x y`	Double click	coordinates
`TC x y`	Triple click	coordinates
`H x y`	Hover	coordinates
`T text`	Type text	multi-line text
`K keys`	Press keys	space-separated key names
`S dir amt x y`	Scroll	direction, amount, coordinates
`D x1 y1 x2 y2`	Drag	start to end coordinates
`Z x1 y1 x2 y2`	Zoom (screenshot region)	bounding box
`N url`	Navigate	URL or `"back"` / `"forward"`
`J code`	Execute JavaScript	multi-line code
`W`	Wait for page settle	none
`ST tabId`	Switch tab	tab ID
`NT url`	New tab	URL
`LT`	List tabs	none
`PL json`	Present plan	JSON with domains/approach

After executing each batch of commands, the extension takes a screenshot and feeds it back as a user message. The loop continues until the response contains no commands.

Tools (21 Total)

`computer`

Full mouse, keyboard, and screenshot control via Chrome DevTools Protocol (CDP).

This uses Anthropic's built-in computer_20250124 tool type, so the model natively understands its parameters. The display_width_px and display_height_px are set dynamically from the actual viewport dimensions.

{
  "name": "computer",
  "type": "computer_20250124",
  "display_width_px": "<dynamic>",
  "display_height_px": "<dynamic>",
  "display_number": 1
}

Param	Type	Description
`action`	`enum`	`"left_click"`, `"right_click"`, `"double_click"`, `"triple_click"`, `"hover"`, `"type"`, `"key"`, `"screenshot"`, `"wait"`, `"scroll"`, `"scroll_to"`, `"left_click_drag"`, `"zoom"`
`coordinate`	`[x, y]`	Pixel coordinates for click/hover actions
`text`	`string`	Text to type (for `type`) or key combo (for `key`, e.g. `"cmd+a"`)
`duration`	`number` (0-30)	Seconds to wait (for `wait` action)
`scroll_direction`	`enum`	`"up"`, `"down"`, `"left"`, `"right"`
`scroll_amount`	`number` (1-10)	Scroll ticks, default 3
`start_coordinate`	`[x, y]`	Start position for `left_click_drag`
`region`	`[x0, y0, x1, y1]`	Bounding box for `zoom` action
`repeat`	`number` (1-100)	Repeat count for `key` action
`ref`	`string`	Element ref ID from `read_page`/`find`. Required for `scroll_to`, optional alternative to coordinates for clicks.
`modifiers`	`string`	Modifier keys for clicks: `"ctrl"`, `"shift"`, `"alt"`, `"cmd"`, combinable with `+`
`tabId`	`number`	Required. Tab ID to execute on.

Implementation: All automation is done via Chrome DevTools Protocol (CDP v1.3), attached via chrome.debugger.attach(tabId, "1.3").

Click: Input.dispatchMouseEvent with mouseMoved (100ms delay), then mousePressed + mouseReleased. Multi-click loops with incrementing clickCount.
Type: Character by character via Input.insertText. Uppercase/special chars apply shift modifier.
Key: Parses space-separated key names, supports + chords. Maps modifiers to CDP bitmask (alt=1, ctrl=2, meta=4, shift=8).
Screenshot: Page.captureScreenshot (PNG). Downscales for Retina. Applies a resize algorithm optimized for token cost (pxPerToken: 28, maxTargetPx: 1568). Stores scaling context so subsequent coordinate actions map correctly.
Scroll: Input.dispatchMouseEvent with mouseWheel, delta = scrollAmount * 100 pixels. Auto-takes a follow-up screenshot.
scroll_to: Scrolls an element into view by its ref ID (from read_page/find).
hover: Dispatches mouseMoved without clicking.
zoom: Captures a cropped screenshot of a specific region.
Drag: Sequence of mouseMoved -> mousePressed -> mouseMoved -> mouseReleased.

Before screenshots and clicks, sends HIDE_FOR_TOOL_USE to temporarily hide the Claude overlay, then SHOW_AFTER_TOOL_USE after.

Permission mapping:

Action	Permission Type
`screenshot`, `scroll`, `scroll_to`, `zoom`	`READ_PAGE_CONTENT`
`left_click`, `right_click`, `double_click`, `triple_click`, `hover`, `left_click_drag`	`CLICK`
`type`, `key`	`TYPE`

`navigate`

Navigate to a URL, or go forward/back in browser history.

{
  "name": "navigate",
  "input_schema": {
    "properties": {
      "url": {
        "type": "string",
        "description": "URL to navigate to. Use 'forward'/'back' for history navigation."
      },
      "tabId": { "type": "number" }
    },
    "required": ["url", "tabId"]
  }
}

Queries api.anthropic.com/api/web/domain_info/browser_extension?domain=... to classify the target domain. category1/category2 domains are blocked.
"back"/"forward" use chrome.tabs.goBack() / chrome.tabs.goForward().
URLs without a protocol prefix get https:// prepended.
Execution: chrome.tabs.update(tabId, { url }).

`read_page`

Get an accessibility tree representation of elements on the page.

{
  "name": "read_page",
  "description": "Get an accessibility tree representation of elements on the page. By default returns all elements including non-visible ones. Output is limited to 50000 characters.",
  "input_schema": {
    "properties": {
      "filter": {
        "type": "string",
        "enum": ["interactive", "all"],
        "description": "Filter: 'interactive' for buttons/links/inputs only, 'all' for all elements"
      },
      "tabId": { "type": "number" },
      "depth": {
        "type": "number",
        "description": "Max tree depth (default: 15)"
      },
      "ref_id": {
        "type": "string",
        "description": "Reference ID of a parent element to scope the read to (e.g. 'ref_5')"
      },
      "max_chars": {
        "type": "number",
        "description": "Max characters for output (default: 50000)"
      }
    },
    "required": ["tabId"]
  }
}

Injects window.__generateAccessibilityTree(filter, depth, maxChars, refId) via chrome.scripting.executeScript. The generator:

Walks the DOM recursively up to depth levels (default 15).
Maps elements to ARIA roles (e.g. <a> -> link, <button> -> button, <input type="text"> -> textbox, <select> -> combobox).
Extracts accessible names from: aria-label, placeholder, title, alt, associated <label>, text content.
Each element gets a persistent ref ID (ref_1, ref_2, ...) stored in window.__claudeElementMap using WeakRef. These refs are reused by form_input, find, computer, upload_image, and file_upload.
When ref_id is provided, scopes the tree walk to that element and its descendants (useful for large pages).
When output exceeds max_chars, returns an error suggesting a smaller depth or a ref_id to narrow scope.
For <select> elements, also lists all <option> children with their text, selected state, and value.

Output format (indented tree):

role "accessible name" [ref_1] href="/bar" type="text" placeholder="Search..."
  option "Option A" (selected) value="a"
  option "Option B" value="b"

`form_input`

Set values in form elements using element reference IDs from read_page.

{
  "name": "form_input",
  "input_schema": {
    "properties": {
      "ref": {
        "type": "string",
        "description": "Element reference ID from read_page (e.g. 'ref_1')"
      },
      "value": {
        "type": ["string", "boolean", "number"],
        "description": "Value to set. Boolean for checkboxes, option value/text for selects."
      },
      "tabId": { "type": "number" }
    },
    "required": ["ref", "value", "tabId"]
  }
}

Looks up the DOM element via window.__claudeElementMap[ref].deref().
Scrolls element into view, then sets value based on element type (<select>, checkbox, radio, date/time, range, number, text/textarea).
Dispatches both change and input events with { bubbles: true } to trigger framework reactivity.
Verifies the page domain hasn't changed mid-action (prevents cross-domain attacks).

`find`

Find elements on the page using natural language. Uses a nested LLM call internally.

{
  "name": "find",
  "description": "Find elements on the page using natural language. Can search by purpose (e.g. 'search bar') or text content. Returns up to 20 matching elements with references and coordinates.",
  "input_schema": {
    "properties": {
      "query": {
        "type": "string",
        "description": "Natural language description of what to find"
      },
      "tabId": { "type": "number" }
    },
    "required": ["query", "tabId"]
  }
}

Two-stage LLM-in-the-loop architecture:

Injects __generateAccessibilityTree("all") to get the full a11y tree (not viewport-filtered).
Sends the tree to claude-sonnet-4-5 (max_tokens: 800) asking it to match elements semantically.
Inner model returns structured results: FOUND: N / SHOWING: M, then pipe-delimited rows of ref | role | name | type | x,y | reason.
Parses and returns up to 20 elements. If more than 20 match, suggests refining the query.

So: Claude (outer) asks find("login button") -> extension gets full a11y tree -> Claude (inner) semantically matches -> returns refs to Claude (outer).

`get_page_text`

Extract raw text content from the page, prioritizing article content.

{
  "name": "get_page_text",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "max_chars": {
        "type": "number",
        "description": "Max characters for output (default: 50000)"
      }
    },
    "required": ["tabId"]
  }
}

Searches for content using a priority-ordered selector list: article, main, [class*="articleBody"], [class*="article-body"], [class*="post-content"], [class*="entry-content"], [class*="content-body"], [role="main"], .content, #content. Picks the element with the longest textContent if multiple match.

Returns:

Title: {document.title}
URL: {window.location.href}
Source element: <{tagName}>
---
{cleaned text}

`javascript_tool`

Execute JavaScript code in the context of the current page.

{
  "name": "javascript_tool",
  "description": "Execute JavaScript code in the context of the current page. The code runs in the page's context and can interact with the DOM, window object, and page variables.",
  "input_schema": {
    "properties": {
      "action": {
        "type": "string",
        "description": "Must be set to 'javascript_exec'"
      },
      "text": {
        "type": "string",
        "description": "JavaScript code to execute. Result of last expression returned automatically."
      },
      "tabId": { "type": "number" }
    },
    "required": ["action", "text", "tabId"]
  }
}

`tabs_context`

Get context information about all tabs in the current tab group.

{
  "name": "tabs_context",
  "input_schema": { "properties": {}, "required": [] }
}

Returns tab IDs, URLs, and titles for all tabs in the session's Chrome Tab Group.

`tabs_create`

Create a new empty tab in the current tab group.

{
  "name": "tabs_create",
  "input_schema": { "properties": {}, "required": [] }
}

`read_console_messages`

Read browser console messages from a specific tab.

{
  "name": "read_console_messages",
  "description": "Read browser console messages (console.log, console.error, console.warn, etc.) from a specific tab. Returns console messages from the current domain only.",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "onlyErrors": { "type": "boolean", "description": "If true, only return error messages" },
      "clear": { "type": "boolean", "description": "If true, clear after reading" },
      "pattern": { "type": "string", "description": "Regex pattern to filter messages" },
      "limit": { "type": "number", "description": "Max messages to return (default: 100)" }
    },
    "required": ["tabId"]
  }
}

`read_network_requests`

Read HTTP network requests from a specific tab.

{
  "name": "read_network_requests",
  "description": "Read HTTP network requests (XHR, Fetch, documents, images, etc.) from a specific tab.",
  "input_schema": {
    "properties": {
      "tabId": { "type": "number" },
      "urlPattern": { "type": "string", "description": "URL pattern to filter requests" },
      "clear": { "type": "boolean" },
      "limit": { "type": "number", "description": "Max requests (default: 100)" }
    },
    "required": ["tabId"]
  }
}

`upload_image`

Upload a previously captured screenshot or user-uploaded image to a file input or drag-and-drop target.

{
  "name": "upload_image",
  "input_schema": {
    "properties": {
      "imageId": { "type": "string", "description": "ID of captured screenshot or user-uploaded image" },
      "ref": { "type": "string", "description": "Element reference ID for file inputs" },
      "coordinate": { "type": "array", "description": "[x, y] for drag & drop targets" },
      "tabId": { "type": "number" },
      "filename": { "type": "string", "description": "Optional filename (default: 'image.png')" }
    },
    "required": ["imageId", "tabId"]
  }
}

`file_upload`

Upload files from the local filesystem to a file input element.

{
  "name": "file_upload",
  "description": "Upload one or multiple files from the local filesystem to a file input element on the page. Do not click on file upload buttons -- clicking opens a native file picker dialog that you cannot see.",
  "input_schema": {
    "properties": {
      "paths": { "type": "array", "items": { "type": "string" }, "description": "Absolute file paths" },
      "ref": { "type": "string", "description": "Element reference ID of the file input" },
      "tabId": { "type": "number" }
    },
    "required": ["paths", "ref", "tabId"]
  }
}

`resize_window`

Resize the current browser window.

{
  "name": "resize_window",
  "input_schema": {
    "properties": {
      "width": { "type": "number" },
      "height": { "type": "number" },
      "tabId": { "type": "number" }
    },
    "required": ["width", "height", "tabId"]
  }
}

`gif_creator`

Record browser actions and export as animated GIF.

{
  "name": "gif_creator",
  "description": "Manage GIF recording and export for browser automation sessions. Control when to start/stop recording browser actions, then export as an animated GIF with visual overlays.",
  "input_schema": {
    "properties": {
      "action": {
        "type": "string",
        "enum": ["start_recording", "stop_recording", "export", "clear"]
      },
      "tabId": { "type": "number" },
      "coordinate": { "type": "array", "description": "[x, y] for drag & drop upload after export" },
      "download": { "type": "boolean" },
      "filename": { "type": "string" },
      "options": {
        "type": "object",
        "properties": {
          "showClickIndicators": { "type": "boolean" },
          "showDragPaths": { "type": "boolean" },
          "showActionLabels": { "type": "boolean" },
          "showProgressBar": { "type": "boolean" },
          "showWatermark": { "type": "boolean" },
          "quality": { "type": "number", "description": "1-30, lower = better quality" }
        }
      }
    },
    "required": ["action", "tabId"]
  }
}

`update_plan`

Present a plan to the user for approval before taking actions. Used in "follow a plan" permission mode.

{
  "name": "update_plan",
  "input_schema": {
    "properties": {
      "domains": {
        "type": "array",
        "items": { "type": "string" },
        "description": "List of domains to visit (approved when user accepts)"
      },
      "approach": {
        "type": "array",
        "items": { "type": "string" },
        "description": "High-level description of actions. 3-7 items."
      }
    },
    "required": ["domains", "approach"]
  }
}

`shortcuts_list`

List all available shortcuts and workflows.

{
  "name": "shortcuts_list",
  "description": "List all available shortcuts and workflows. Returns shortcuts with their commands, descriptions, and whether they are workflows. Use shortcuts_execute to run one.",
  "input_schema": { "properties": {}, "required": [] }
}

`shortcuts_execute`

Execute a shortcut or workflow.

{
  "name": "shortcuts_execute",
  "description": "Execute a shortcut or workflow by running it in a new sidepanel window using the current tab.",
  "input_schema": {
    "properties": {
      "shortcutId": { "type": "string" },
      "command": { "type": "string", "description": "Command name (e.g. 'debug', 'summarize'). No leading slash." }
    },
    "required": []
  }
}

`turn_answer_start`

Internal control tool. Called immediately before the model's text response each turn.

{
  "name": "turn_answer_start",
  "description": "Call this immediately before your text response to the user for this turn. Required every turn.",
  "input_schema": { "properties": {}, "required": [] }
}

MCP Tools: `tabs_context_mcp` and `tabs_create_mcp`

When connected to Claude Desktop or Claude Code via native messaging, MCP tools operate in a separate "MCP tab group" distinct from the sidepanel's tab group.

tabs_context_mcp: Get context for the MCP tab group. Has a createIfEmpty boolean parameter to auto-create a group.
tabs_create_mcp: Create a new tab in the MCP tab group.

Tool Availability by Page Type

Page Type	Available Tools
Normal web pages	All tools
`chrome://`, `chrome-extension://`, `about:blank`	`navigate` only
`chromewebstore.google.com`	`navigate` only (no script injection)

Permission Model

Every tool execution goes through a PermissionManager singleton.

Permission Modes

Mode	Behavior
`"ask"` (default)	Prompt user before each action on a new domain
`"follow_a_plan"`	Claude must submit a plan via `update_plan` first; once approved, listed domains are pre-authorized
`"skip_all_permission_checks"`	Auto-approve everything

Permission Types

Type	Triggered By
`NAVIGATE`	`navigate` tool
`READ_PAGE_CONTENT`	`read_page`, `computer` (screenshot, scroll, scroll_to, zoom)
`CLICK`	`computer` (left_click, right_click, double_click, triple_click, hover, drag)
`TYPE`	`computer` (type, key)
`UPLOAD_IMAGE`	`upload_image`
`PLAN_APPROVAL`	`update_plan` tool
`REMOTE_MCP`	Remote MCP tool execution
`DOMAIN_TRANSITION`	Navigating from one domain to another

Permission Durations

once: Single use, tied to a specific toolUseId. Auto-revoked after use.
always: Persistent for that domain until manually revoked.

Domain Safety Classification

Queries api.anthropic.com/api/web/domain_info/browser_extension?domain=... to classify domains. Results cached for 5 minutes.

Category	Behavior
`category0`	No restrictions
`category1` / `category2`	Fully blocked (page replaced with `blocked.html`)
`category3`	Force-prompt mode: always asks, no "always allow" option
`category_org_blocked`	Blocked by the user's organization

Security: URL Verification

Before any mutating action (form_input, click, type, key, drag), verifies that the tab's current URL domain still matches the original. Prevents attacks where the page navigates mid-action.

Tab Group Management

The extension manages Chrome Tab Groups to organize tabs per conversation session:

Each session has a main tab (where the user opened the side panel) and optional secondary tabs (created by the agent via tabs_create).
Tab groups are tracked in chrome.storage.local via a TabGroupManager.
Tab context (all tab IDs, URLs, titles in the group) is injected as <system-reminder> in tool results when tabs change.
Visual indicators on the tab group: loading animation (animated dots) while the agent works, checkmark when done.
The tabs_context tool returns all tabs in the group so Claude knows which tabId to target.

MCP Integration

The extension connects to external MCP servers via two channels:

Native Messaging (Claude Desktop / Claude Code)

Connects via chrome.runtime.connectNative() to native messaging hosts:

com.anthropic.claude_browser_extension (Claude Desktop)
com.anthropic.claude_code_browser_extension (Claude Code)

Protocol:

ping/pong for connection handshake
get_status for status checks
tool_request with { method: "execute_tool", params: { tool, args, tabGroupId, tabId, client_id } }
tool_response with { result: { content } } or { error: { content } }
mcp_connected/mcp_disconnected lifecycle events

When connected, Claude Desktop/Code can invoke browser tools (navigate, click, screenshot, etc.) on the user's browser tabs.

Remote MCP Servers

Managed via a Zustand store (mcpServersStore). Remote MCP tools are fetched via the standard MCP tools/list protocol, executed via tools/call, and gated behind the REMOTE_MCP permission type.

Workflow Recording

The extension can record user browser actions and convert them into reusable "shortcuts":

Uses rrweb for session recording (DOM snapshots, mouse events)
Captures screenshots with click position overlays
Supports speech narration: users can speak while recording, and the narration is transcribed and used as the primary intent signal
Workflow steps are described by claude-haiku-4-5 (small model for low-latency step descriptions)
Workflow summary generated by Claude with screenshot analysis
Output: a reusable saved prompt with <inputs> for parameterization
Saved prompts can be scheduled for recurring execution via chrome.alarms (daily, weekly, monthly, annually)

Message Compaction

For long sessions, the extension manages message size:

Threshold: ~25MB
When exceeded: extracts base64 images from older messages and replaces them with placeholders
Tracks tokensSaved for display in the UI

sshh12/claude-chrome-extension-internals.md

Claude for Chrome Extension Internals (v1.0.56)

Architecture Overview

The Agentic Loop

Standard Mode

Quick Mode

Tools (21 Total)

computer

navigate

read_page

form_input

find

get_page_text

javascript_tool

tabs_context

tabs_create

read_console_messages

read_network_requests

upload_image

file_upload

resize_window

gif_creator

update_plan

shortcuts_list

shortcuts_execute

turn_answer_start

MCP Tools: tabs_context_mcp and tabs_create_mcp