A deep dive into how Anthropic's Claude for Chrome extension works under the hood, based on reading the extension's source code.
The extension is a Chrome Manifest V3 extension built with React, using the Anthropic JavaScript SDK directly in the browser. It opens as a side panel alongside the active tab and acts as an AI agent that can see and interact with web pages.
The Anthropic JS SDK is instantiated in the browser with dangerouslyAllowBrowser: true, authenticating via either OAuth PKCE (default, scopes: user:profile user:inference) or a manual API key (feature-gated for internal use).
The extension operates in two distinct modes:
When the user types a prompt, the side panel runs an agentic tool-use loop:
- Build system prompt: Loaded from server config (
chrome_ext_system_prompt). Appends platform info (Mac/Windows), current tab context (URLs, titles, tab IDs), and any domain-specific skills. - Resolve available tools: Based on page type (system pages only get
navigate, normal pages get all tools). Builds the full tool definitions array. - Stream response: Calls
claude-sonnet-4-5-20250929viabeta.messages.stream()with betasoauth-2025-04-20,max_tokens: 10000. - Tool execution loop: If the response contains
tool_useblocks:- Check permissions via the PermissionManager.
- Execute the tool.
- Feed
tool_resultmessages back to the API. - Inject updated tab context as
<system-reminder>if tabs changed. - Loop continues until Claude returns a response with no tool calls.
- Cleanup: Detaches the CDP debugger from all tabs.
If the user cancels mid-loop (Stop Claude button), all pending tools receive error tool_result responses.
User prompt
|
v
Build system prompt (server config + platform + tab context + domain skills)
|
v
Resolve available tools for current page type
|
v
Stream claude-sonnet-4-5 response (max_tokens: 10000)
|
v
tool_use block in response?
|-- yes --> checkPermission(domain, toolUseId)
| |-- not granted, needs prompt --> show permission dialog --> wait
| |-- granted --> execute tool
| |-- denied --> return error tool_result
| |
| v
| tool_result --> inject tab context update --> feed back to API --> loop
|
|-- no --> done, detach debugger
A fast, low-latency mode that bypasses the standard tool-use protocol entirely. Instead of sending tool definitions to the API, it uses a compact command language with single-letter commands:
- Sends
tools: [](no API tools) withstop_sequences: ["\n<<END>>"] - Supports an
effortparameter ("high","medium","low","none") via theeffort-2025-11-24beta - When the model name includes
[fast], addsspeed: "fast"and thefast-mode-2026-02-01beta
The model outputs compact commands that the extension parses and executes directly:
| Command | Action | Args |
|---|---|---|
C x y |
Left click | coordinates |
RC x y |
Right click | coordinates |
DC x y |
Double click | coordinates |
TC x y |
Triple click | coordinates |
H x y |
Hover | coordinates |
T text |
Type text | multi-line text |
K keys |
Press keys | space-separated key names |
S dir amt x y |
Scroll | direction, amount, coordinates |
D x1 y1 x2 y2 |
Drag | start to end coordinates |
Z x1 y1 x2 y2 |
Zoom (screenshot region) | bounding box |
N url |
Navigate | URL or "back" / "forward" |
J code |
Execute JavaScript | multi-line code |
W |
Wait for page settle | none |
ST tabId |
Switch tab | tab ID |
NT url |
New tab | URL |
LT |
List tabs | none |
PL json |
Present plan | JSON with domains/approach |
After executing each batch of commands, the extension takes a screenshot and feeds it back as a user message. The loop continues until the response contains no commands.
Full mouse, keyboard, and screenshot control via Chrome DevTools Protocol (CDP).
This uses Anthropic's built-in computer_20250124 tool type, so the model natively understands its parameters. The display_width_px and display_height_px are set dynamically from the actual viewport dimensions.
{
"name": "computer",
"type": "computer_20250124",
"display_width_px": "<dynamic>",
"display_height_px": "<dynamic>",
"display_number": 1
}| Param | Type | Description |
|---|---|---|
action |
enum |
"left_click", "right_click", "double_click", "triple_click", "hover", "type", "key", "screenshot", "wait", "scroll", "scroll_to", "left_click_drag", "zoom" |
coordinate |
[x, y] |
Pixel coordinates for click/hover actions |
text |
string |
Text to type (for type) or key combo (for key, e.g. "cmd+a") |
duration |
number (0-30) |
Seconds to wait (for wait action) |
scroll_direction |
enum |
"up", "down", "left", "right" |
scroll_amount |
number (1-10) |
Scroll ticks, default 3 |
start_coordinate |
[x, y] |
Start position for left_click_drag |
region |
[x0, y0, x1, y1] |
Bounding box for zoom action |
repeat |
number (1-100) |
Repeat count for key action |
ref |
string |
Element ref ID from read_page/find. Required for scroll_to, optional alternative to coordinates for clicks. |
modifiers |
string |
Modifier keys for clicks: "ctrl", "shift", "alt", "cmd", combinable with + |
tabId |
number |
Required. Tab ID to execute on. |
Implementation: All automation is done via Chrome DevTools Protocol (CDP v1.3), attached via chrome.debugger.attach(tabId, "1.3").
- Click:
Input.dispatchMouseEventwithmouseMoved(100ms delay), thenmousePressed+mouseReleased. Multi-click loops with incrementingclickCount. - Type: Character by character via
Input.insertText. Uppercase/special chars apply shift modifier. - Key: Parses space-separated key names, supports
+chords. Maps modifiers to CDP bitmask (alt=1, ctrl=2, meta=4, shift=8). - Screenshot:
Page.captureScreenshot(PNG). Downscales for Retina. Applies a resize algorithm optimized for token cost (pxPerToken: 28,maxTargetPx: 1568). Stores scaling context so subsequent coordinate actions map correctly. - Scroll:
Input.dispatchMouseEventwithmouseWheel, delta =scrollAmount * 100pixels. Auto-takes a follow-up screenshot. - scroll_to: Scrolls an element into view by its ref ID (from
read_page/find). - hover: Dispatches
mouseMovedwithout clicking. - zoom: Captures a cropped screenshot of a specific region.
- Drag: Sequence of
mouseMoved->mousePressed->mouseMoved->mouseReleased.
Before screenshots and clicks, sends HIDE_FOR_TOOL_USE to temporarily hide the Claude overlay, then SHOW_AFTER_TOOL_USE after.
Permission mapping:
| Action | Permission Type |
|---|---|
screenshot, scroll, scroll_to, zoom |
READ_PAGE_CONTENT |
left_click, right_click, double_click, triple_click, hover, left_click_drag |
CLICK |
type, key |
TYPE |
Navigate to a URL, or go forward/back in browser history.
{
"name": "navigate",
"input_schema": {
"properties": {
"url": {
"type": "string",
"description": "URL to navigate to. Use 'forward'/'back' for history navigation."
},
"tabId": { "type": "number" }
},
"required": ["url", "tabId"]
}
}- Queries
api.anthropic.com/api/web/domain_info/browser_extension?domain=...to classify the target domain.category1/category2domains are blocked. "back"/"forward"usechrome.tabs.goBack()/chrome.tabs.goForward().- URLs without a protocol prefix get
https://prepended. - Execution:
chrome.tabs.update(tabId, { url }).
Get an accessibility tree representation of elements on the page.
{
"name": "read_page",
"description": "Get an accessibility tree representation of elements on the page. By default returns all elements including non-visible ones. Output is limited to 50000 characters.",
"input_schema": {
"properties": {
"filter": {
"type": "string",
"enum": ["interactive", "all"],
"description": "Filter: 'interactive' for buttons/links/inputs only, 'all' for all elements"
},
"tabId": { "type": "number" },
"depth": {
"type": "number",
"description": "Max tree depth (default: 15)"
},
"ref_id": {
"type": "string",
"description": "Reference ID of a parent element to scope the read to (e.g. 'ref_5')"
},
"max_chars": {
"type": "number",
"description": "Max characters for output (default: 50000)"
}
},
"required": ["tabId"]
}
}Injects window.__generateAccessibilityTree(filter, depth, maxChars, refId) via chrome.scripting.executeScript. The generator:
- Walks the DOM recursively up to
depthlevels (default 15). - Maps elements to ARIA roles (e.g.
<a>->link,<button>->button,<input type="text">->textbox,<select>->combobox). - Extracts accessible names from:
aria-label,placeholder,title,alt, associated<label>, text content. - Each element gets a persistent ref ID (
ref_1,ref_2, ...) stored inwindow.__claudeElementMapusingWeakRef. These refs are reused byform_input,find,computer,upload_image, andfile_upload. - When
ref_idis provided, scopes the tree walk to that element and its descendants (useful for large pages). - When output exceeds
max_chars, returns an error suggesting a smallerdepthor aref_idto narrow scope. - For
<select>elements, also lists all<option>children with their text, selected state, and value.
Output format (indented tree):
role "accessible name" [ref_1] href="/bar" type="text" placeholder="Search..."
option "Option A" (selected) value="a"
option "Option B" value="b"
Set values in form elements using element reference IDs from read_page.
{
"name": "form_input",
"input_schema": {
"properties": {
"ref": {
"type": "string",
"description": "Element reference ID from read_page (e.g. 'ref_1')"
},
"value": {
"type": ["string", "boolean", "number"],
"description": "Value to set. Boolean for checkboxes, option value/text for selects."
},
"tabId": { "type": "number" }
},
"required": ["ref", "value", "tabId"]
}
}- Looks up the DOM element via
window.__claudeElementMap[ref].deref(). - Scrolls element into view, then sets value based on element type (
<select>, checkbox, radio, date/time, range, number, text/textarea). - Dispatches both
changeandinputevents with{ bubbles: true }to trigger framework reactivity. - Verifies the page domain hasn't changed mid-action (prevents cross-domain attacks).
Find elements on the page using natural language. Uses a nested LLM call internally.
{
"name": "find",
"description": "Find elements on the page using natural language. Can search by purpose (e.g. 'search bar') or text content. Returns up to 20 matching elements with references and coordinates.",
"input_schema": {
"properties": {
"query": {
"type": "string",
"description": "Natural language description of what to find"
},
"tabId": { "type": "number" }
},
"required": ["query", "tabId"]
}
}Two-stage LLM-in-the-loop architecture:
- Injects
__generateAccessibilityTree("all")to get the full a11y tree (not viewport-filtered). - Sends the tree to
claude-sonnet-4-5(max_tokens: 800) asking it to match elements semantically. - Inner model returns structured results:
FOUND: N/SHOWING: M, then pipe-delimited rows ofref | role | name | type | x,y | reason. - Parses and returns up to 20 elements. If more than 20 match, suggests refining the query.
So: Claude (outer) asks find("login button") -> extension gets full a11y tree -> Claude (inner) semantically matches -> returns refs to Claude (outer).
Extract raw text content from the page, prioritizing article content.
{
"name": "get_page_text",
"input_schema": {
"properties": {
"tabId": { "type": "number" },
"max_chars": {
"type": "number",
"description": "Max characters for output (default: 50000)"
}
},
"required": ["tabId"]
}
}Searches for content using a priority-ordered selector list: article, main, [class*="articleBody"], [class*="article-body"], [class*="post-content"], [class*="entry-content"], [class*="content-body"], [role="main"], .content, #content. Picks the element with the longest textContent if multiple match.
Returns:
Title: {document.title}
URL: {window.location.href}
Source element: <{tagName}>
---
{cleaned text}
Execute JavaScript code in the context of the current page.
{
"name": "javascript_tool",
"description": "Execute JavaScript code in the context of the current page. The code runs in the page's context and can interact with the DOM, window object, and page variables.",
"input_schema": {
"properties": {
"action": {
"type": "string",
"description": "Must be set to 'javascript_exec'"
},
"text": {
"type": "string",
"description": "JavaScript code to execute. Result of last expression returned automatically."
},
"tabId": { "type": "number" }
},
"required": ["action", "text", "tabId"]
}
}Get context information about all tabs in the current tab group.
{
"name": "tabs_context",
"input_schema": { "properties": {}, "required": [] }
}Returns tab IDs, URLs, and titles for all tabs in the session's Chrome Tab Group.
Create a new empty tab in the current tab group.
{
"name": "tabs_create",
"input_schema": { "properties": {}, "required": [] }
}Read browser console messages from a specific tab.
{
"name": "read_console_messages",
"description": "Read browser console messages (console.log, console.error, console.warn, etc.) from a specific tab. Returns console messages from the current domain only.",
"input_schema": {
"properties": {
"tabId": { "type": "number" },
"onlyErrors": { "type": "boolean", "description": "If true, only return error messages" },
"clear": { "type": "boolean", "description": "If true, clear after reading" },
"pattern": { "type": "string", "description": "Regex pattern to filter messages" },
"limit": { "type": "number", "description": "Max messages to return (default: 100)" }
},
"required": ["tabId"]
}
}Read HTTP network requests from a specific tab.
{
"name": "read_network_requests",
"description": "Read HTTP network requests (XHR, Fetch, documents, images, etc.) from a specific tab.",
"input_schema": {
"properties": {
"tabId": { "type": "number" },
"urlPattern": { "type": "string", "description": "URL pattern to filter requests" },
"clear": { "type": "boolean" },
"limit": { "type": "number", "description": "Max requests (default: 100)" }
},
"required": ["tabId"]
}
}Upload a previously captured screenshot or user-uploaded image to a file input or drag-and-drop target.
{
"name": "upload_image",
"input_schema": {
"properties": {
"imageId": { "type": "string", "description": "ID of captured screenshot or user-uploaded image" },
"ref": { "type": "string", "description": "Element reference ID for file inputs" },
"coordinate": { "type": "array", "description": "[x, y] for drag & drop targets" },
"tabId": { "type": "number" },
"filename": { "type": "string", "description": "Optional filename (default: 'image.png')" }
},
"required": ["imageId", "tabId"]
}
}Upload files from the local filesystem to a file input element.
{
"name": "file_upload",
"description": "Upload one or multiple files from the local filesystem to a file input element on the page. Do not click on file upload buttons -- clicking opens a native file picker dialog that you cannot see.",
"input_schema": {
"properties": {
"paths": { "type": "array", "items": { "type": "string" }, "description": "Absolute file paths" },
"ref": { "type": "string", "description": "Element reference ID of the file input" },
"tabId": { "type": "number" }
},
"required": ["paths", "ref", "tabId"]
}
}Resize the current browser window.
{
"name": "resize_window",
"input_schema": {
"properties": {
"width": { "type": "number" },
"height": { "type": "number" },
"tabId": { "type": "number" }
},
"required": ["width", "height", "tabId"]
}
}Record browser actions and export as animated GIF.
{
"name": "gif_creator",
"description": "Manage GIF recording and export for browser automation sessions. Control when to start/stop recording browser actions, then export as an animated GIF with visual overlays.",
"input_schema": {
"properties": {
"action": {
"type": "string",
"enum": ["start_recording", "stop_recording", "export", "clear"]
},
"tabId": { "type": "number" },
"coordinate": { "type": "array", "description": "[x, y] for drag & drop upload after export" },
"download": { "type": "boolean" },
"filename": { "type": "string" },
"options": {
"type": "object",
"properties": {
"showClickIndicators": { "type": "boolean" },
"showDragPaths": { "type": "boolean" },
"showActionLabels": { "type": "boolean" },
"showProgressBar": { "type": "boolean" },
"showWatermark": { "type": "boolean" },
"quality": { "type": "number", "description": "1-30, lower = better quality" }
}
}
},
"required": ["action", "tabId"]
}
}Present a plan to the user for approval before taking actions. Used in "follow a plan" permission mode.
{
"name": "update_plan",
"input_schema": {
"properties": {
"domains": {
"type": "array",
"items": { "type": "string" },
"description": "List of domains to visit (approved when user accepts)"
},
"approach": {
"type": "array",
"items": { "type": "string" },
"description": "High-level description of actions. 3-7 items."
}
},
"required": ["domains", "approach"]
}
}List all available shortcuts and workflows.
{
"name": "shortcuts_list",
"description": "List all available shortcuts and workflows. Returns shortcuts with their commands, descriptions, and whether they are workflows. Use shortcuts_execute to run one.",
"input_schema": { "properties": {}, "required": [] }
}Execute a shortcut or workflow.
{
"name": "shortcuts_execute",
"description": "Execute a shortcut or workflow by running it in a new sidepanel window using the current tab.",
"input_schema": {
"properties": {
"shortcutId": { "type": "string" },
"command": { "type": "string", "description": "Command name (e.g. 'debug', 'summarize'). No leading slash." }
},
"required": []
}
}Internal control tool. Called immediately before the model's text response each turn.
{
"name": "turn_answer_start",
"description": "Call this immediately before your text response to the user for this turn. Required every turn.",
"input_schema": { "properties": {}, "required": [] }
}When connected to Claude Desktop or Claude Code via native messaging, MCP tools operate in a separate "MCP tab group" distinct from the sidepanel's tab group.
tabs_context_mcp: Get context for the MCP tab group. Has acreateIfEmptyboolean parameter to auto-create a group.tabs_create_mcp: Create a new tab in the MCP tab group.
| Page Type | Available Tools |
|---|---|
| Normal web pages | All tools |
chrome://, chrome-extension://, about:blank |
navigate only |
chromewebstore.google.com |
navigate only (no script injection) |
Every tool execution goes through a PermissionManager singleton.
| Mode | Behavior |
|---|---|
"ask" (default) |
Prompt user before each action on a new domain |
"follow_a_plan" |
Claude must submit a plan via update_plan first; once approved, listed domains are pre-authorized |
"skip_all_permission_checks" |
Auto-approve everything |
| Type | Triggered By |
|---|---|
NAVIGATE |
navigate tool |
READ_PAGE_CONTENT |
read_page, computer (screenshot, scroll, scroll_to, zoom) |
CLICK |
computer (left_click, right_click, double_click, triple_click, hover, drag) |
TYPE |
computer (type, key) |
UPLOAD_IMAGE |
upload_image |
PLAN_APPROVAL |
update_plan tool |
REMOTE_MCP |
Remote MCP tool execution |
DOMAIN_TRANSITION |
Navigating from one domain to another |
once: Single use, tied to a specifictoolUseId. Auto-revoked after use.always: Persistent for that domain until manually revoked.
Queries api.anthropic.com/api/web/domain_info/browser_extension?domain=... to classify domains. Results cached for 5 minutes.
| Category | Behavior |
|---|---|
category0 |
No restrictions |
category1 / category2 |
Fully blocked (page replaced with blocked.html) |
category3 |
Force-prompt mode: always asks, no "always allow" option |
category_org_blocked |
Blocked by the user's organization |
Before any mutating action (form_input, click, type, key, drag), verifies that the tab's current URL domain still matches the original. Prevents attacks where the page navigates mid-action.
The extension manages Chrome Tab Groups to organize tabs per conversation session:
- Each session has a main tab (where the user opened the side panel) and optional secondary tabs (created by the agent via
tabs_create). - Tab groups are tracked in
chrome.storage.localvia aTabGroupManager. - Tab context (all tab IDs, URLs, titles in the group) is injected as
<system-reminder>in tool results when tabs change. - Visual indicators on the tab group: loading animation (animated dots) while the agent works, checkmark when done.
- The
tabs_contexttool returns all tabs in the group so Claude knows whichtabIdto target.
The extension connects to external MCP servers via two channels:
Connects via chrome.runtime.connectNative() to native messaging hosts:
com.anthropic.claude_browser_extension(Claude Desktop)com.anthropic.claude_code_browser_extension(Claude Code)
Protocol:
ping/pongfor connection handshakeget_statusfor status checkstool_requestwith{ method: "execute_tool", params: { tool, args, tabGroupId, tabId, client_id } }tool_responsewith{ result: { content } }or{ error: { content } }mcp_connected/mcp_disconnectedlifecycle events
When connected, Claude Desktop/Code can invoke browser tools (navigate, click, screenshot, etc.) on the user's browser tabs.
Managed via a Zustand store (mcpServersStore). Remote MCP tools are fetched via the standard MCP tools/list protocol, executed via tools/call, and gated behind the REMOTE_MCP permission type.
The extension can record user browser actions and convert them into reusable "shortcuts":
- Uses rrweb for session recording (DOM snapshots, mouse events)
- Captures screenshots with click position overlays
- Supports speech narration: users can speak while recording, and the narration is transcribed and used as the primary intent signal
- Workflow steps are described by
claude-haiku-4-5(small model for low-latency step descriptions) - Workflow summary generated by Claude with screenshot analysis
- Output: a reusable saved prompt with
<inputs>for parameterization - Saved prompts can be scheduled for recurring execution via
chrome.alarms(daily, weekly, monthly, annually)
For long sessions, the extension manages message size:
- Threshold: ~25MB
- When exceeded: extracts base64 images from older messages and replaces them with placeholders
- Tracks
tokensSavedfor display in the UI