Created
July 28, 2016 15:17
-
-
Save jmscholen/36794a6cc12a7399cd66ae8c68a63836 to your computer and use it in GitHub Desktop.
Scraping Info from HTML page
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#example of scrapping information such as a listing from a website | |
#that list results in a container for each 'match' of the search criteria. | |
require 'nokogiri' | |
website = "www.example.com" | |
scrapped_info = Array.new | |
doc = File.open(ARGV[0]) { |f| Nokogiri::HTML(f) } | |
doc.css('.result-container').each do |x| | |
#want to push results into a hash so that we can query it later | |
a = { | |
"name" => x.css('strong').text, | |
"details" => | |
{ | |
"introduction" => "Type: #{x.css('leaf').text}" + "| Schedule: #{x.css('clock-o').text}", | |
"start_age" => x.css('.row[2] div').text.split(/-/)[0].split(" "), | |
"end_age" => x.css('.row[2] div').text.split(/-/)[1].split(" "), | |
"address_street_name" => x.css('map-marker result-category-content').text.split(/,/)[0], | |
"address_city_name" => x.css('map-marker result-category-content').text.split(/,/)[1], | |
"website_url" => "#{website}" + "#{x.css('@href')}" | |
} | |
} | |
scrapped_info << a | |
end | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment