Skip to content

Instantly share code, notes, and snippets.

@vtanathip
Created April 26, 2026 09:31
Show Gist options
  • Select an option

  • Save vtanathip/45ed95377974bf8866df39d571f557ac to your computer and use it in GitHub Desktop.

Select an option

Save vtanathip/45ed95377974bf8866df39d571f557ac to your computer and use it in GitHub Desktop.
Advance DOM

How AdvancedDomSerializer Works

  1. Entry Point AdvancedDomSerializer.capture(page) is called once per step in the fixture, right before the AI resolution:

const context = await AdvancedDomSerializer.capture(page); resolved = await resolver.resolve(text, context, page); It returns an AdvancedPageContext:

{ url, title, abbreviatedDom, activeFrames, stats } abbreviatedDom is the serialized string sent to the AI. stats tracks frame/shadow/custom element counts. The whole thing is capped at ADV_DOM_MAX_TOKENS * 4 characters (default 24,000 chars) — anything beyond is truncated.

  1. Two-Layer Architecture The work is split between Node.js and the browser context because Playwright's API only exists in Node.js, while DOM access only exists in the browser.

Layer Code Responsibility Node.js serializeFrame() Discovers child iframes via frame.childFrames(), builds frameSelector, recurses Browser browserWalkFn() Walks the DOM, reads element attributes, calls getBoundingClientRect(), handles shadow roots You can't call frame.childFrames() from inside a page.evaluate() — that's a Playwright API. And you can't access shadow roots or getBoundingClientRect() from Node.js without going into the browser. So each layer does only what it can.

  1. Frame Traversal serializeFrame() runs in Node.js. It calls frame.evaluate(browserWalkFn) to get the DOM lines for the current frame, then loops over frame.childFrames() to recurse:

for (const child of frame.childFrames()) { const iframeSelector = await frame.evaluate((node) => { // builds "iframe#id" or "iframe[name=...]" from the element's attributes }, iframeHandle);

const childFrameSelector = frameSelector ? ${frameSelector} >> ${iframeSelector} // chains with " >> " : iframeSelector;

out.push([FRAME selector="${iframeSelector}" src="${childUrl}"]); await serializeFrame(child, childFrameSelector, depth + 1, out, stats); out.push([/FRAME]); } Three nested iframes produce: iframe[name="AppFrame"] >> iframe#contentframe >> iframe#AppFrame. This string ends up as frameSelector in the ↳ PATH line.

  1. Browser-Side DOM Walk browserWalkFn runs inside frame.evaluate() — it has no access to Playwright, only the DOM.

Skipped entirely: script, style, meta, noscript, link, head, iframe (iframes are handled in Node.js).

Also skipped: any element where getBoundingClientRect() returns width=0, height=0 — invisible/detached elements aren't useful to the AI.

"Interactive" means any of:

Native tag: input, button, select, textarea, a Custom element (hyphenated tag like my-button) Has an ARIA role from the interactive roles set (button, textbox, searchbox, etc.) Has tabindex that isn't -1 Two lines are emitted per element:

Display line — rich, human-readable: button#submit[aria-label="Sign In"].btn-primary "Sign In". This is what the AI reads to understand the DOM tree. ↳ PATH line — machine-readable, copy-paste ready: ↳ locator="button#submit" shadowHost="..." frameSelector="...". This maps 1:1 to ResolvedAction fields. Only interactive elements get a ↳ PATH line. Non-interactive elements (divs, spans, etc.) only get a display line.

  1. Shadow DOM Handling walk() carries a shadowChain array down the recursion — an ordered list of host locators from outermost to innermost shadow root.

When an element has a shadowRoot:

const hostLocator = bestLocator(el); // e.g. "payment-widget#pay" const newChain = [...shadowChain, hostLocator]; // append this host to the chain

lines.push([SHADOW-ROOT host="${hostLocator}"]); for (const child of shadow.children) { walk(child, depth + 2, newChain); // children get the updated chain } lines.push([/SHADOW-ROOT]); When an interactive element is found deep inside, shadowChain.join(' ') becomes shadowHost in its ↳ PATH:

↳ locator="input[name="card"]" shadowHost="payment-widget#pay" For nested shadow DOM (shadow inside shadow), the chain grows:

↳ locator="button" shadowHost="outer-widget inner-widget" Light DOM children of a custom element continue with the parent's shadowChain, not newChain — because they aren't inside the shadow root.

  1. Custom Element Handling When isCustom(tag) is true (hyphenated tag name), the element gets [CUSTOM] markers:

lines.push([CUSTOM: layout-autosuggest#appAutosuggest]); // ... walk shadow root and light DOM children ... lines.push([/CUSTOM]); Custom elements do not get a ↳ PATH line, even though they pass isInteractive(). The guard:

if (interactive && !custom) { // custom elements excluded lines.push(↳ ${pathParts.join(' ')}); } Why: Custom elements are component wrappers, not native inputs. Playwright's .fill(), .click() etc. need a real , , etc. to work reliably. By suppressing the ↳ PATH, the AI is forced to look inside the [SHADOW-ROOT] block for the actual interactive native element. If the AI only sees [CUSTOM] with no path hint, it won't try to fill() the wrapper.

  1. ↳ PATH Lines Each ↳ PATH line contains exactly the fields that map to ResolvedAction:

↳ locator="input[name="email"]" shadowHost="login-form" frameSelector="iframe#auth >> iframe#form" Field Maps to Used by executor locator action.locator root.locator(action.locator) shadowHost action.shadowHost chains .locator() per host in the space-separated list frameSelector action.frameSelector IframeHandler.resolve(page, frameSelector) Why pre-computed rather than derived on the fly:

The AI sees an abbreviated DOM — it doesn't see raw HTML or full Playwright APIs. Without these pre-computed paths, the AI would have to infer the correct frameSelector chain by reading nested [FRAME] markers and mentally reconstructing the path. That inference is error-prone, especially with 3 levels of nested iframes. Pre-computing them in the serializer means the AI just copies the values directly — no reasoning about structure required.

  1. Concrete Output Example A
inside a shadow root, inside a nested iframe, produces:

[FRAME selector="iframe#auth" src="https://app/auth"] [FRAME selector="iframe#form" src="https://app/auth/form" parent="iframe#auth"] [CUSTOM: login-form#lf] ↳ locator="login-form#lf" frameSelector="iframe#auth >> iframe#form" [SHADOW-ROOT host="login-form#lf"] div.form-wrapper button#submit[aria-label="Sign In"] "Sign In" ↳ locator="button#submit" shadowHost="login-form#lf" frameSelector="iframe#auth >> iframe#form" [/SHADOW-ROOT] [/CUSTOM] [/FRAME] [/FRAME] Note: login-form#lf gets a ↳ PATH here — wait, actually it wouldn't with the !custom guard in place. The only ↳ PATH emitted would be on button#submit inside the shadow root. The AI reads the [CUSTOM] display line to understand context, then finds the button#submit path hint and copies locator, shadowHost, and frameSelector directly into the returned ResolvedAction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment