Created
November 19, 2014 05:54
-
-
Save naveedn/58fbbf2cf52daca32664 to your computer and use it in GitHub Desktop.
A simple scraper designed to get a list summary of every event that student organizations have posted to UMD's Orgsync platform.. because they won't allow access to their REST api
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Require the gems | |
require 'capybara/poltergeist' | |
require 'selenium-webdriver' | |
require 'json' | |
# Configure Poltergeist to not blow up on websites with js errors | |
Capybara.register_driver :poltergeist do |app| | |
Capybara::Poltergeist::Driver.new(app, js_errors: false) | |
end | |
# Configure Capybara to use Poltergeist as the driver (good for headless) | |
Capybara.default_driver = :poltergeist | |
# Configure Capybara to use Selenium as the driver (good for debugging) | |
# Capybara.default_driver = :selenium | |
# Go to the URL | |
browser = Capybara.current_session | |
url = "https://orgsync.com/141/community/calendar" | |
browser.visit url | |
# Switch to list view | |
links = browser.all 'div.osw-events-index-view-tabs button.osw-button' | |
link = links[1] | |
link.click | |
# Get the containing div for events | |
eventlist = browser.find("div.osw-events-list") | |
events = eventlist.all("div.osw-events-list-item") | |
# Iterate through each event in the list, and open the modal when clicked | |
events.each do |event| | |
event_hash = {} | |
event.find(".osw-events-list-item-picture-container").click | |
# a popup appears with the condensed information for each event | |
title = event.find("a.osw-events-show-title").text | |
date = event.all(".osw-events-show-section-main")[0].all("div")[0].text # @hack | |
time = event.find(".osw-events-show-time").text | |
location = event.find(".osw-events-show-location").text | |
organization = event.find(".osw-events-show-portal-name").text | |
# put it in a hash for organization | |
event_hash["organization"] = organization | |
event_hash["event_title"] = title | |
event_hash["date"] = date | |
event_hash["time"] = time | |
event_hash["location"] = location | |
# print the information out | |
puts JSON.pretty_generate(event_hash) | |
#close modal | |
event.find("i.osw-popup-close-button").click | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment