Skip to content

Instantly share code, notes, and snippets.

@glego
Last active February 26, 2018 12:48
Show Gist options
  • Save glego/48688445269f9309e144dbc35008b245 to your computer and use it in GitHub Desktop.
Save glego/48688445269f9309e144dbc35008b245 to your computer and use it in GitHub Desktop.
Graby opendemocracy.net issue
title: //h1
author: //div[contains(@class, 'entry-meta')]//a[@rel='author']
date: //meta[@property="article:published_time"]/@content
strip: //nav
strip: //header
strip: //div[contains(@class, 'comments')]
body: //div[contains(concat(' ',normalize-space(@class),' '),' post__content ')]
test_url: https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/
[2018-02-26 12:41:14] graby.DEBUG: Graby is ready to fetch [] []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for opendemocracy.net in primary folder {"host":"opendemocracy.net"} []
[2018-02-26 12:41:14] graby.DEBUG: ... found site config opendemocracy.net.txt {"host":"opendemocracy.net.txt"} []
[2018-02-26 12:41:14] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for global in primary folder {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: ... found site config global.txt {"host":"global.txt"} []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: opendemocracy.net {"key":"opendemocracy.net"} []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for global in primary folder {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: ... found site config global.txt {"host":"global.txt"} []
[2018-02-26 12:41:14] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: global {"key":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: opendemocracy.net.merged {"key":"opendemocracy.net.merged"} []
[2018-02-26 12:41:14] graby.DEBUG: Fetching url: https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/ {"url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying using method "get" on url "https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/" {"method":"get","url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/"} []
[2018-02-26 12:41:14] graby.DEBUG: Use default user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2" for url "https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/"} []
[2018-02-26 12:41:14] graby.DEBUG: Use default referer "http://www.google.co.uk/url?sa=t&source=web&cd=1" for url "https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/"} []
[2018-02-26 12:41:14] graby.DEBUG: Data fetched: [array] {"data":{"effective_url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/","body":"(only length for debug): 70713","headers":"text/html; charset=UTF-8","all_headers":{"date":"Mon, 26 Feb 2018 12:45:22 GMT","server":"Apache/2.2.22 (Debian)","x-powered-by":"PHP/5.4.39-0+deb7u2","vary":"Accept-Encoding,Cookie","cache-control":"max-age=3, must-revalidate","wp-super-cache":"Served supercache file from PHP","access-control-allow-origin":"https://opendemocracy.net","transfer-encoding":"chunked","content-type":"text/html; charset=UTF-8"},"status":200}} []
[2018-02-26 12:41:14] graby.DEBUG: Treating as UTF-8 {"encoding":"utf-8"} []
[2018-02-26 12:41:14] graby.DEBUG: Opengraph data: [array] {"ogData":{"og_site_name":"New thinking for the British economy","og_type":"article","og_title":"Out of time: the fragile temporality of Carillion’s accumulation model","og_url":"https://www.opendemocracy.net/neweconomics/time-fragile-temporality-carillions-accumulation-model/","og_description":"Look anywhere on Carillion’s website and we see metaphors for its supposed tangibility and strength, from the way it advertises its Tarmac Group heritage to its list of construction achievements which in fact precede its inception. The website projects an image of a company steeped in all things concrete and solid. However, as Carillion moves into","og_locale":"en_GB","og_updated_time":"2018-01-17T17:30:16+00:00","og_image":"https://cdn.opendemocracy.net/neweconomics/wp-content/uploads/sites/5/2018/01/25816239318_deb85adfe2_k.jpg","og_image_secure_url":"https://cdn.opendemocracy.net/neweconomics/wp-content/uploads/sites/5/2018/01/Carillion-blog-Figure-2.jpg","og_image_width":"2048","og_image_height":"1536","og_image_type":"image/jpeg"}} []
[2018-02-26 12:41:14] graby.DEBUG: Looking for site config files to see if single page link exists [] []
[2018-02-26 12:41:14] graby.DEBUG: Returning cached and merged site config for opendemocracy.net {"host":"opendemocracy.net"} []
[2018-02-26 12:41:14] graby.DEBUG: No "single_page_link" config found [] []
[2018-02-26 12:41:14] graby.DEBUG: Attempting to extract content [] []
[2018-02-26 12:41:14] graby.DEBUG: Returning cached and merged site config for opendemocracy.net {"host":"opendemocracy.net"} []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for fingerprint.wordpress.com in primary folder {"host":"fingerprint.wordpress.com"} []
[2018-02-26 12:41:14] graby.DEBUG: ... found site config .wordpress.com.txt {"host":".wordpress.com.txt"} []
[2018-02-26 12:41:14] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for global in primary folder {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: ... site config for global already loaded in this request {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: .wordpress.com {"key":".wordpress.com"} []
[2018-02-26 12:41:14] graby.DEBUG: . looking for site config for global in primary folder {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: ... site config for global already loaded in this request {"host":"global"} []
[2018-02-26 12:41:14] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: fingerprint.wordpress.com.merged {"key":"fingerprint.wordpress.com.merged"} []
[2018-02-26 12:41:14] graby.DEBUG: Appending site config settings from fingerprint.wordpress.com (fingerprint match) {"host":"fingerprint.wordpress.com"} []
[2018-02-26 12:41:14] graby.DEBUG: Cached site config with key: fingerprint.wordpress.com {"key":"fingerprint.wordpress.com"} []
[2018-02-26 12:41:14] graby.DEBUG: Strings replaced: 0 (find_string and/or replace_string) {"count":0} []
[2018-02-26 12:41:14] graby.DEBUG: Attempting to parse HTML with libxml {"parser":"libxml"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //h1 for title {"pattern":"//h1"} []
[2018-02-26 12:41:14] graby.DEBUG: title matched: Out of time: the fragile temporality of Carillion’s accumulation model {"title":"Out of time: the fragile temporality of Carillion’s accumulation model"} []
[2018-02-26 12:41:14] graby.DEBUG: ...XPath match: {pattern} ["pattern","//h1"] []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[contains(@class, 'entry-meta')]//a[@rel='author'] for author {"pattern":"//div[contains(@class, 'entry-meta')]//a[@rel='author']"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //meta[@property="article:published_time"]/@content for date {"pattern":"//meta[@property=\"article:published_time\"]/@content"} []
[2018-02-26 12:41:14] graby.DEBUG: date matched: 2018-01-17T17:23:47+00:00 {"date":"2018-01-17T17:23:47+00:00"} []
[2018-02-26 12:41:14] graby.DEBUG: ...XPath match: {pattern} ["pattern","//meta[@property=\"article:published_time\"]/@content"] []
[2018-02-26 12:41:14] graby.DEBUG: Trying //html[@lang]/@lang for language {"pattern":"//html[@lang]/@lang"} []
[2018-02-26 12:41:14] graby.DEBUG: Language matched: en-GB {"language":"en-GB"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //nav to strip element {"pattern":"//nav"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //header to strip element {"pattern":"//header"} []
[2018-02-26 12:41:14] graby.DEBUG: Stripping 1 elements (strip) {"length":1} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[contains(@class, 'comments')] to strip element {"pattern":"//div[contains(@class, 'comments')]"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //*[@id='comments' or @id='respond'] to strip element {"pattern":"//*[@id='comments' or @id='respond']"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[contains(concat(' ',normalize-space(@class),' '),' navigation ')] to strip element {"pattern":"//div[contains(concat(' ',normalize-space(@class),' '),' navigation ')]"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying sharedaddy to strip element {"string":"sharedaddy"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying wpadvert to strip element {"string":"wpadvert"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying commentlist to strip element {"string":"commentlist"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying sociable to strip element {"string":"sociable"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying related_post to strip element {"string":"related_post"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying wp-socializer to strip element {"string":"wp-socializer"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying addtoany to strip element {"string":"addtoany"} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[contains(concat(' ',normalize-space(@class),' '),' post__content ')] for body (content length: 17809) {"pattern":"//div[contains(concat(' ',normalize-space(@class),' '),' post__content ')]","content_length":17809} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[@id="content"]//div[contains(@class, 'entry-content') or contains(@class, 'entrytext') or @class='main' or @class='entry'] for body (content length: 17809) {"pattern":"//div[@id=\"content\"]//div[contains(@class, 'entry-content') or contains(@class, 'entrytext') or @class='main' or @class='entry']","content_length":17809} []
[2018-02-26 12:41:14] graby.DEBUG: Trying //div[@id='content'] for body (content length: 17809) {"pattern":"//div[@id='content']","content_length":17809} []
[2018-02-26 12:41:14] graby.DEBUG: Using Readability [] []
[2018-02-26 12:41:14] graby.DEBUG: Detected date: 2018-01-17T17:23:47+00:00 {"date":"2018-01-17T17:23:47+00:00"} []
[2018-02-26 12:41:14] graby.DEBUG: Success ? {"is_success":false} []
[2018-02-26 12:41:14] graby.DEBUG: Extract failed [] []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment