Skip to content

Instantly share code, notes, and snippets.

View princePeterHansen's full-sized avatar

Peter Hansen princePeterHansen

View GitHub Profile
@princePeterHansen
princePeterHansen / books_to_scrape_full_json_like_response.json
Last active February 18, 2020 22:17
Response with JSON like format
[
{
"selector": "article.product_pod",
"get": "json",
"extract": [
{
"selector": "h3",
"get": "text",
"as": "title"
},
[
{
"selector": "article.product_pod h3",
"get": "text"
},
{
"selector": "article.product_pod h3 a",
"get": "attribute",
"attribute": "href"
}
@princePeterHansen
princePeterHansen / books_to_scrape_basic_response.json
Created February 18, 2020 21:06
Basic response from books to scrape
[
{
"selector": "article.product_pod h3",
"get": "text",
"data": [
"A Light in the ...",
"Tipping the Velvet",
"Soumission",
"Sharp Objects",
"Sapiens: A Brief History ...",
@princePeterHansen
princePeterHansen / book_element_markup.html
Created February 18, 2020 20:42
Example of book element markup
<article class="product_pod">
<div class="image_container">
<a href="catalogue/a-light-in-the-attic_1000/index.html"><img
src="media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg" alt="A Light in the Attic"
class="thumbnail" /></a>
</div>
<h3>
<a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a>
</h3>
@princePeterHansen
princePeterHansen / web_scraping_response_example.json
Created February 18, 2020 20:19
An example web scraping response from proxybot.io
{
"data": [
{
"title": "Our Band Could Be ...",
"price": "£57.25",
"image": "media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg",
"link": "catalogue/our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html"
},
{
"title": "Libertarianism for Beginners",
@princePeterHansen
princePeterHansen / basic_web_scraping_request.json
Last active February 18, 2020 20:13
Basic web scraping request with Proxybot.io
[
{
"selector": "#someId .someClass a",
"get": "text"
}
]
[
{
"selector": "article.product_pod",
"get": "json",
"extract": [
{
"selector": "h3",
"get": "text",
"as": "title"
},
const puppeteer = require('puppeteer');
const express = require('express');
const app = express();
const port = 3000;
app.get('/', async (req, res) => {
const {url} = req.query;
if(!url) {
res.status(400).send("Bad request: 'url' param is missing!");
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
const proxy = 'https://proxybot.io/api/v1/API_KEY?url=';
const url = 'https://whatismyipaddress.com/';
const pageUrl = proxy + url;
@princePeterHansen
princePeterHansen / puppeteer_with_proxy.js
Last active February 28, 2025 10:49
Using a proxy in puppeteer
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
headless: false,
args: [ '--proxy-server=200.73.128.156:3128' ]
});
const page = await browser.newPage();
const pageUrl = 'https://whatismyipaddress.com/';