Skip to content

Instantly share code, notes, and snippets.

@cedrickchee
Created June 22, 2025 05:05
Show Gist options
  • Save cedrickchee/c63b443539dbda596a0655d6758391bd to your computer and use it in GitHub Desktop.
Save cedrickchee/c63b443539dbda596a0655d6758391bd to your computer and use it in GitHub Desktop.
Scraping public AI benchmarks/leaderboards using Dia browser `/scrape` skill

Automating Unified-Bench Data Pipeline

The Challenge

The current Unified-Bench Google Sheet data is manually updated by human with their bare hands. This is tedious and slow but the data is very accurate. Current web AI agent for general tasks including Manus.ai, Flowith, Emergent, GenSpark, etc all fall short - they couldn't solve the last 10% of Unified-Bench's requirements but getting to 90% is not very challenging for these web agents. For example, the agent stuck or failed to parse and map the AI model IDs/names madness from various sources, some agents cannot even scrape the text from images. I have to collab and get my hands dirty writing and tweaking regex for dealing with the inconsistent benchmarks data. Every benchmarks have their own fine print at the bottom (examples: is it high/low compute? thinking/non-thinking model? reasoning/non-reasoning/hybrid model? 16k/32k thinking budget? pass@1/average pass@4? etc.). Coincidentally, this can be my "soft"-AGI 2027 benchmark. lol!

Agentic browser to the rescue?

I got in Dia beta release. Dia is an AI browser. This space is starting to explode with competitors like Perplexity's Comet browser and Google's Mariner.

Dia seems like a better suit for my requirements. Upfront, my impression is it targeting more technical people such as power users or developers. The interface is centered around chat and give you more control of the browser. This is a different approach than Perplexity's Comet (may be targeting less technical savvy users, I don't have access to it, so I don't really know.)

Dia Browser

I'm going to scrape the public AI benchmarks/leaderboards using Dia. You can add skills. It's like a prompt template that you can reuse for repetitive prompt. Below is an example of /scrape skill for scraping HTML and transforming them into JSON data that adhere to the provided schema.

(to be continue...)

<system>
You are a meticulous web-scraping assistant.
Output rules (no exceptions):
1. Wrap the entire response in a Markdown fenced code-block: ```json … ```
2. Inside the fence, return ONE valid JSON object that follows the schema below.
3. No extra text, no markdown outside the fence, no explanations.
Schema (use null or [] when data is missing):
{
"url": "string",
"title": "string",
"description": "string|null",
"headings": ["string"],
"paragraphs": ["string"],
"images": ["string"], /* absolute URLs */
"links": [{ "text": "string", "href": "string" }]
}
</system>
<user>
Scrape the current page.
• Page URL → {page.url}
• Raw HTML (truncated) → {page.html}
• If the user highlighted anything, treat **{selection}** as the main body and ignore the rest.
Steps
1. Resolve relative URLs to absolute (use page origin).
2. Collect visible <h1>–<h3> into **headings**.
3. Collect visible <p> into **paragraphs** (skip boilerplate/nav).
4. For **links**, grab anchors in main content only.
5. Return the JSON object wrapped exactly as specified above.
</user>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment