Skip to content

Instantly share code, notes, and snippets.

@shauvik
Last active May 2, 2025 22:49
Show Gist options
  • Save shauvik/524bdecb0eb25462ad1dbee493374bf8 to your computer and use it in GitHub Desktop.
Save shauvik/524bdecb0eb25462ad1dbee493374bf8 to your computer and use it in GitHub Desktop.
Vibe coding for GSOC browser automation

I spent an hour vibe coding with AI assistant (Cursor) to automate downloading the Kotlin Foundation GSoC applications for review. This was necessary to filter the applications that were relevalt for projects proposed by myself and also scaling to seek help from other expert colleagues.

Here are my prompts and generated code at the end.

Prompt 1:

Write a playwright script to open the URLs in this file and download the file identified by $('div.info__link a').href

The response was a JS script that could extract the URLs from the local applications.csv file that was in the workspace, visit them and try to find links to open & download the PDFs. It generated the packages.json and provided necessary instructions to run the generated script.

Problem: The automation needed to authenticate with Google as myself to see the applications. I thought of having it connect to my already running browser instance.

Prompt 2:

Can you make this script connect with my already running Chrome browser?

The script was updated to connect to a browser instance using remote debugging and provided me instructions on how to do it.

Problem: The script was loading the PDF files correctly in the browser but saving it with a random name that was hard to map to each candidate.

Prompts 3-5:

Construct fileName from last part of url and add suffix .pdf to it

For each url item, also extract the following field from the page and save it in proposals.csv file 
$('a[title="Opens default mail application"]').innerText

Add the name field to the emails.csv file too identified by the following variable.
$('div.body__contributor div.ng-star-inserted').innerHTML.split('<')[0]

This got me the desired output in the CSV file but the PDF file saved was actually an empty HTML file without the proposal content.

// Get the href attribute
const fileUrl = await page.$eval('div.info__link a', el => el.href);
      
// Download the file
const response = await page.goto(fileUrl);
const fileName = path.basename(fileUrl);
const filePath = path.join(downloadsDir, fileName);
      
// Save the file
const buffer = await response.body();
fs.writeFileSync(filePath, buffer);
console.log(`Downloaded: ${fileName}`);

Perhaps the script was not waiting enough for the PDF to be rendered between the 2 sections and adding a simple wait could have solved this. Since this is a test antipattern, I wanted to avoid this and tried several times to prompt to solve this problem. Another observation was that page.goto was opening up the PDF in a new browser window and perhaps doing it in the same window might solve the issue.

I tried to reprompt several times to solve this problem but none of the solutions worked. So, I eventually decided to bite the bullet and look up the playwright API for opening the PDF URL directly in the page and got the script working.

Takeaways

  • AI is a great assistant to help learn a new API
  • Providing chrome JS selectors in the prompt was an effective way to show the AI assistant which element i wanted to work with and it was able to convert it to Playwright API/selectors
  • AI was great for small code changes and refactoring (e.g., adding a field, updating logs)
  • AI will not solve all the problems and one will need to solve the remaining 10-20%
const { chromium } = require('playwright');
const fs = require('fs');
const path = require('path');
async function main() {
// Read the CSV file
const csvContent = fs.readFileSync('applications.csv', 'utf-8');
const lines = csvContent.split('\n');
// Extract URLs
const urls = lines.map(line => {
const parts = line.split(',');
return parts[parts.length - 1].trim();
}).filter(url => url.startsWith('http'));
// Connect to an existing Chrome browser
// You'll need to start Chrome with remote debugging enabled:
// On macOS: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
const browser = await chromium.connectOverCDP('http://localhost:9222');
const context = browser.contexts()[0];
const page = await context.newPage();
// Create downloads directory if it doesn't exist
const downloadsDir = path.join(__dirname, 'downloads');
if (!fs.existsSync(downloadsDir)) {
fs.mkdirSync(downloadsDir);
}
// Create index.csv file with header
const indexFile = path.join(__dirname, 'proposals.csv');
fs.writeFileSync(indexFile, 'proposal_id, name, email, title, file_name, url\n');
// Visit each URL and download the file
for (const url of urls) {
try {
console.log(`Visiting: ${url}`);
await page.goto(url);
// Wait for the link to be visible
await page.waitForSelector('div.info__link a');
// Get the href attribute
const fileUrl = await page.$eval('div.info__link a', el => el.href);
// Extract email
const email = await page.$eval('a[title="Opens default mail application"]', el => el.innerText);
// Extract name
const name = await page.$eval('div.body__contributor div.ng-star-inserted', el => el.innerHTML.split('<')[0].trim());
// Extract title
const title = await page.$eval('h1.body__title', el => el.innerText);
// Extract proposal ID from URL
const urlParts = url.split('/');
const proposalId = urlParts[urlParts.length - 1];
// Download and save the file
const fileName = `${name}-${proposalId}.pdf`;
const filePath = path.join(downloadsDir, fileName);
const getResp = await page.context().request.get(fileUrl);
const buffer = await getResp.body();
fs.writeFileSync(filePath, buffer);
console.log(`Downloaded: ${fileName} and saved details for ${name}`);
// Save to CSV
fs.appendFileSync(indexFile, `${proposalId}, "${name}", ${email}, "${title}", "${fileName}", ${url}\n`);
// Add a small delay between requests
await page.waitForTimeout(2000);
} catch (error) {
console.error(`Error processing ${url}:`, error.message);
}
}
await browser.close();
}
main().catch(console.error);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment